Conference article

Automatic Lemmatisation of Lithuanian MWEs

Loïc Boizou
Centre of Computational Linguistics, Vytautas Magnus University, Kaunas, Lithuania

Jolanta Kovalevskaitė
Centre of Computational Linguistics, Vytautas Magnus University, Kaunas, Lithuania

Erika Rimkutė
Centre of Computational Linguistics, Vytautas Magnus University, Kaunas, Lithuania

Download article

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:8, s. 41-49

NEALT Proceedings Series 23:8, s. 41-49

Show more +

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This article presents a study of lemmatisation of flexible multiword expressions in Lithuanian. An approach based on syntactic analysis designed for multiword term lemmatisation was adapted for a broader range of MWEs taken from the Dictionary of Lithuanian Nominal Phrases. In the present analysis, the main lemmatisation errors are identified and some improvements are proposed. It shows that automatic lemmatisation can be improved by taking into account the whole set of grammatical forms for each MWE. It would allow selecting the optimal grammatical form for lemmatisation and identifying some grammatical restrictions.

Keywords

No keywords available

References

Loïc Boizou, Gintare Grigonyte, Erika Rimkuté, and Andrius Utka. 2012. Automatic Inference of Base Forms for Multiword Terms in Lithuanian. In Proceedings of the Fifth International Conference Human Language Technologies – The Baltic Perspective, pages 27–35.

Vidas Daudaravicius and Ruta Marcinkeviciené. 2004. Gravity Counts for the Boundaries of Collocations. International Journal of Corpus Linguistics, 9(2):321–348.

Jolanta Kovalevskaite. 2014. Phraseme-type and Phraseme-token: a Corpus-driven Evidence for Morphological Flexibility of Phrasemes. Res Humanitariae, XVI, pages 126–143.

Elizaveta Loginova, Anita Gojun, Helena Blancafort, Marie Guégan, Tatiana Gornostay, and Ulrich Heid. 2012. Reference Lists for the Evaluation of Term Extraction Tools. In Proceedings of the 10th International Congress on Terminology and Knowledge Engineering (TKE), pages 177–192, Madrid, Spain.

Ru¯ta Marcinkevic?iene?. 2010. Lietuviu? kalbos kolokacijos. Vytauto Didžiojo universitetas, Kaunas, Lithuania.

Jonas Paulauskas (ed). 2001. Frazeologijos žodynas. Lietuviu? kalbos institutas, Vilnius, Lithuania. Erika Rimkute and Vidas Daudaravicius. 2007. Morfologinis dabartines lietuviu? kalbos tekstyno anotavimas. Kalbu? studijos, 11:30–35.

Erika Rimkute, Agne Bielinskiene, and Jolanta Kovalevskaite (eds). 2012. Lietuviu? kalbos daiktavardiniu? fraziu? žodynas. Vytauto Didžiojo universitetas, Kaunas, Lithuania. Gregor Thurmair and Vera Aleksic. 2012. Creating Term and Lexicon Entries from Phrase Tables. In Proceedings of the 16th EAMT Conference, pages 253–260.

Vytautas Zinkevicius, Vidas Daudaravicius, and Erika Rimkute. 2005. The Morphologically Annotated Lithuanian Corpus. In Proceedings of The Second Baltic Conference on Human Language Technologies, pages 365–370.

Vytautas Zinkevicius. 2000. Lemuoklis – morfologinei analizei. Darbai ir Dienos, 24:245–273.

Citations in Crossref