Building an open-source development infrastructure for language technology projects

Sjur N. Moshagen
University of Tromsø, Norway

Tommi A. Pirinen
Helsinki university, Finland

Trond Trosterud
University of Tromsø, Norway

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:31, s. 343-352

NEALT Proceedings Series 16:31, s. 343-352

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The article presents the Giellatekno & Divvun language technology resources; more specifically the effort to utilise open-source tools to improve the build infrastructure; and the solutions to help adapt to best practices for software development. The article especially discusses how the infrastructure has been remade to cope with an increasing number of languages without incurring extra overhead for the maintainers; and at the same time let the linguists concentrate on the linguistic work. Finally; the article discusses how a uniform infrastructure like the one presented can be used to easily compare languages in terms of morphological or computational complexity; coverage or for cross-lingual applications.


NoDaLiDa 2013; Infrastructure; Computational linguistics; Finite-state transducers; Language resources; Multilinguality


