Svanhvít Ingólfsdóttir
Department of Computer Science, Reykjavik University, Iceland
Hrafn Loftsson
Department of Computer Science, Reykjavik University, Iceland
Jón Daðason
The Árni Magnússon Institute for Icelandic Studies, University of Iceland, Iceland
Kristín Bjarnadóttir
The Árni Magnússon Institute for Icelandic Studies, University of Iceland, Iceland
Download articlePublished in: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland
Linköping Electronic Conference Proceedings 167:33, p. 310--315
NEALT Proceedings Series 42:33, p. 310--315
Published: 2019-10-02
ISBN: 978-91-7929-995-8
ISSN: 1650-3686 (print), 1650-3740 (online)
Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.