Conference article

Technical Solutions for Reproducible Research

Alexander König
Eurac Research, Italy / CLARIN ERIC, the Netherlands

Egon W. Stemle
Eurac Research, Italy

André Moreira
CLARIN ERIC, the Netherlands

Willem Elbers
CLARIN ERIC, the Netherlands

Download article

Published in: Selected Papers from the CLARIN Annual Conference 2019

Linköping Electronic Conference Proceedings 172:9, p. 66-74

Show more +

Published: 2020-07-03

ISBN: 978-91-7929-807-4

ISSN: 1650-3686 (print), 1650-3740 (online)


In recent years, the reproducibility of scientific research has increasingly come into focus, both by external stakeholders (e.g. funders) and by the research communities themselves. Corpus linguistics, with its methods for creating, processing and analysing corpora, is an integral part of many other disciplines that work with language data and therefore plays a special role. Moreover, language corpora are often living objects that are regularly improved and revised. At the same time, tools for the automatic processing of human language are also being developed further, which can lead to different results with the same processing steps and the same data. This article argues that modern software technologies, such as version control and containerisation, can mitigate the following problems: Software packaging, installation and execution and, equally important, the tracking of corpus modifications throughout its life-cycle. All in all, this leads to transparency of changes to raw data and software tools and thereby enhanced reproducibility.


reproducibility, containerisation, corpus linguistics


No references available

Citations in Crossref