Conference article

Analysing Inconsistencies and Errors in PoS Tagging in two Icelandic Gold Standards

Steinþór Steingrímsson
The Árni Magnússon, Institute for Icelandic Studies, Reykjavík, Iceland

Sigrún Helgadóttir
The Árni Magnússon, Institute for Icelandic Studies, Reykjavík, Iceland

Eirikur Rögnvaldsson
University of Iceland, Reykjavík, Iceland

Download article

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:38, p. 287-291

NEALT Proceedings Series 23:38, p. 287-291

Show more +

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper describes work in progress. We experiment with training a state-of-the-art tagger, Stagger, on a new gold standard, MIM-GOLD, for the PoS tagging of Icelandic. We compare the results to results obtained using a previous gold standard, IFD. Using MIM-GOLD, tagging accuracy is considerably lower, 92.76% compared to 93.67% accuracy for IFD. We analyze and classify the errors made by the tagger in order to explain this difference. We find that inconsistencies and incorrect tags in MIM-GOLD may account for this difference.

Keywords

No keywords available

References

Kristín Bjarnadóttir. 2012. The Database of Modern Icelandic Inflection. In Proceedings of “Language Technology for Normalization of Less-Resourced Languages”, workshop at the 8th International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey.

Sigrún Helgadóttir, Ásta Svavarsdóttir, Eiríkur Rögnvaldsson, Kristín Bjarnadóttir, and Hrafn Loftsson. 2012. The Tagged Icelandic Corpus (MIM). In Proceedings of the workshop Language Technology for Normalization of Less-Resourced Languages, SaLTMiL 8 – AfLaT, LREC 2012, pages 67–72, Istanbul, Turkey.

Sigrún Helgadóttir, Hrafn Loftsson, and Eiríkur Rögnvaldsson. 2014. Correcting errors in a new gold standard for tagging icelandic text. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland.

Verena Henrich, Timo Reuter, and Hrafn Loftsson. 2009. Combitagger: A system for developing combined taggers. In Proceedings of the 22nd International FLAIRS Conference, Special Track: "Applied Natural Language Processing", Florida, USA.

Hrafn Loftsson and Robert Östling. 2013. Tagging a morphologically complex language using an averaged perceptron tagger: The case of icelandic. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA-2013), NEALT Proceedings Series 16, Oslo, Norway.

Hrafn Loftsson, Jökull H. Yngvason, Sigrún Helgadóttir, and Eiríkur Rögnvaldsson. 2010. Developing a PoS-tagged corpus using existing tools. In Proceedings of “Creation and use of basic lexical resources for less-resourced languages”, workshop at the 7th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta.

Hrafn Loftsson. 2008. Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics, 31(1):47–72.

Hrafn Loftsson. 2009. Correcting a PoS-tagged corpus using three complementary methods. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece.

Christopher D. Manning. 2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In Computational Linguistics and Intelligent Text Processing, pages 171–189. Springer.

Jörgen Pind, Friðrik Magnússon, and Stefán Briem. 1991. Íslensk orðtíðnibók [The Icelandic Frequency Dictionary]. The Institute of Lexicography, University of Iceland, Reykjavik, Iceland.

Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL, Edmonton, Canada.

Citations in Crossref