Morphological analysis with limited resources: Latvian example

Pēteris Paikens
University of Latvia, Institute of Mathematics and Computer Science, Latvia

Laura Rituma
University of Latvia, Institute of Mathematics and Computer Science, Latvia

Lauma Pretkalnina
University of Latvia, Institute of Mathematics and Computer Science, Latvia

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:24, s. 267-277

NEALT Proceedings Series 16:24, s. 267-277

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We describe an approach for morphological analysis combining a rule-based word level morphological analyzer with statistical tagging; detailing its application to Latvian language. Latvian is a highly inflective Indo-European language with a rich morphology.

The tools described here include an implementation of Latvian inflectional paradigms; a morphological analysis tool with a guessing module for out-of-vocabulary words; and a statistical POS/morphology tagger for disambiguation of multiple analysis possibilities. Currently achieved accuracy with a training set of only ~40 000 words is 97.9% for part of speech tagging and 93.6% for the full morphological feature tag set; which is better than any previously publicly available taggers for Latvian.

We also describe the construction and methodology of the necessary linguistic resources – a morphological dictionary and an annotated morphological corpus; and evaluate the effect of resource size on analysis accuracy; showing what results can be achieved with limited linguistic resources.


Morphology; inflective language; POS tagging; Latvian language; morphological corpus


