Omorfi—Free and open source morphological lexical database for Finnish

Tommi A Pirinen
Ollscoil Chathair Bhaile Átha Cliath, ADAPT Centre - School of Computing, Dublin City University, Dublin, Ireland

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:44, s. 313-315

NEALT Proceedings Series 23:44, p. 313-315

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This demonstration presents a freely available open source lexical database omorfi. Omorfi is a mature lexicographical database project, started out as a single-person single-purpose free open source morphological analyser project, omorfi has since grown to be used in variety of applications including spell-checking, statistical and rule-based machine translation, treebanking, joint syntactic and morphological parsing, poetry generation, information extraction. In this demonstration we hope to show both the variety of end-user facing applications as well as the tools and interfaces for computational linguists to make the best use of a developing product. We show a shallow database arrangement that has allowed a great variety of contributors from different projects to extend the lexical database while not breaking the continued use of existing end-applications. We hope to show both the best current practices for lexical data management and software engineering with regards to continuous external project integration of a constantly developing product. As case examples we show some of the integrations with following applications: Voikko spell-checking for Windows, Mac OS X, Linux and Android, statistical machine translation pipelines with moses, rule-based machine translation with apertium and traditional xerox style morphological analysis and generation. morphological segmentation, as well as application programming interfaces for python and Java.


Inga nyckelord är tillgängliga


Kenneth R Beesley and Lauri Karttunen. 2003. Finitestate morphology: Xerox tools and techniques.CSLI, Stanford.

Kenneth Ward Church. 1994. Unix™ for poets.Notes of a course from the European Summer School on Language and Speech Communication, Corpus Based Methods.

Christopher Dyer, Smaranda Muresan, and Philip Resnik. 2008. Generalizing word lattice translation. Technical report, DTIC Document.

Mikel L. Forcada, Mireia Ginestí Rosell, Jacob Nordfalk, Jim O’Regan, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, and Francis M. Tyers. 2010. Apertium: a free/open-source platform for rulebased machine translation platform. Machine Translation.

Philipp Koehn and Hieu Hoang. 2007. Factored translation models. In EMNLP-CoNLL, pages 868–876.

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pages 177–180. Association for Computational Linguistics.

Krister Lindén, Erik Axelson, Sam Hardwick, Tommi A Pirinen, and Miikka Silfverberg. 2011. Hfst—framework for compiling and applying morphologies. Systems and Frameworks for Computational Morphology, pages 67–85.

Tommi A Pirinen and Krister Lindén. 2014. Stateof-the-art in weighted finite-state spell-checking. In Computational Linguistics and Intelligent Text Processing, pages 519–532. Springer.

Citeringar i Crossref