Conference article

Nefnir: A high accuracy lemmatizer for Icelandic

Svanhvít Ingólfsdóttir
Department of Computer Science, Reykjavik University, Iceland

Hrafn Loftsson
Department of Computer Science, Reykjavik University, Iceland

Jón Daðason
The Árni Magnússon Institute for Icelandic Studies, University of Iceland, Iceland

Kristín Bjarnadóttir
The Árni Magnússon Institute for Icelandic Studies, University of Iceland, Iceland

Download article

Published in: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:33, p. 310--315

NEALT Proceedings Series 42:33, p. 310--315

Show more +

Published: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.

Keywords

lemmatization morphologically rich languages morphological database

References

No references available

Citations in Crossref