LaMachine: A meta-distribution for NLP software

Maarten van Gompel
Centre for Language and Speech Technology (CLST), Radboud University, Nijmegen, The Netherlands

Iris Hendrickx
Centre for Language and Speech Technology (CLST), Radboud University, Nijmegen, The Netherlands

Ladda ner artikel

Ingår i: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:22, s. 214-226

Visa mer +

Publicerad: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We introduce LaMachine, a unified Natural Language Processing (NLP) open-source software distribution to facilitate the installation and deployment of a large amount of software projects that have been developed in the scope of the CLARIN-NL project and its current successor CLARIAH. Special attention is paid to encouragement of good software development practices and reuse of established infrastructure in the scientific and open-source software development community. We explain what LaMachine is, how it can be used and the technical details. We also compare LaMachine to alternative software distributions and discuss its advantages and limitations. We illustrate how LaMachine can be used in two case studies, one in an exploratory text mining project at the Dutch Health Inspectorate where LaMachine was applied to create a research environment for automatic text analysis for health care quality monitoring, and a second case where LaMachine was used to create a workspace for a one-week, intense collaboration by a diverse research team.


Software distribution, Software metadata, Virtual research environment, Virtual laboratory, Infrastructure


[Boettiger2017] C. Boettiger. 2017. Generating CodeMeta Metadata for R packages. The Journal of Open Source Software, 2:454.

[Jones et al.2016] MB. Jones, C. Boettiger, A. Cabunoc Mayes, A. Smith, P. Slaughter, K. Niemeyer, Y. Gil, M. Fenner, K. Nowak, M. Hahnel, et al. 2016. CodeMeta: an exchange schema for software metadata. KNB Data Repository.

[van Gompel and Reynaert2013] M. van Gompel and M. Reynaert. 2013. FoLiA: A practical XML format for linguistic annotation - a descriptive and comparative study. Computational Linguistics in the Netherlands Journal, 3.

[van Gompel and Reynaert2014] M. van Gompel and M. Reynaert. 2014. CLAM: Quickly deploy nlp commandline tools on the web. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations, pages 71–75. Dublin City University and Association for Computational Linguistics.

[van Gompel et al.2016] M. van Gompel, J. Noordzij, R. de Valk, and A. Scharnhorst. 2016. Guidelines for Software Quality. CLARIAH Task 54.100.

[Zinn2016] C. Zinn. 2016. The CLARIN Language Resource Switchboard. Proceedings of the CLARIN Annual Conference. CLARIN ERIC.

Citeringar i Crossref