Konferensartikel

Coarse-grained sense annotation of Danish across textual domains

Sussi Olsen
University of Copenhagen, Copenhagen, Denmark

Bolette S. Pedersen
University of Copenhagen, Copenhagen, Denmark

Héctor Martínez Alonso
University of Copenhagen, Copenhagen, Denmark

Anders Johannsen
University of Copenhagen, Copenhagen, Denmark

Ladda ner artikel

Ingår i: Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping Electronic Conference Proceedings 112:6, s. 36–43

NEALT Proceedings Series 27:6, s. 36–43

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-049-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We present the results of a coarse-grained sense annotation task on verbs, nouns and adjectives across six textual domains in Danish. We present the domain-wise differences in intercoder agreement and discuss how the applicability and validity of the sense inventory vary depending on domain. We find that domain-wise agreement is not higher in very canonical or edited text. In fact, newswire text and parliament speeches have lower agreement than blogs and chats, probably because the language of these text types is more complex and uses more abstract concepts. We further observe that domains differ in their sense distribution. For instance, newswire and magazines stand out as having a high focus on persons, and discussion fora typically include a restricted number of senses dependent on specialized topics. We anticipate that these findings can be exploited in automatic sense tagging when dealing with domain shift.

Nyckelord

sense annotation; sense tagging; sense inventory; supersenses; Danish; textual domains

Referenser

Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.

Jørg Asmussen and Jakob Halskov. 2012. The CLARIN DK Reference Corpus. In Sprogteknologisk Workshop.

Susan Windisch Brown, Travis Rood, and Martha Palmer. 2010. Number or nuance: Which factors restrict reliable word sense annotation? In LREC.

Massimiliano Ciaramita and Mark Johnson. 2003. Supersense tagging of unknown nouns in wordnet. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 168–175. Association for Computational Linguistics.

Gerard De Melo, Collin F Baker, Nancy Ide, Rebecca J Passonneau, and Christiane Fellbaum. 2012. Empirical comparisons of masc word sense annotations. In LREC, pages 3036–3043.

William A Gale, Kenneth W Church, and David Yarowsky. 1992. One sense per discourse. In Proceedings of the workshop on Speech and Natural Language, pages 233–237. Association for Computational Linguistics.

Nancy Ide and Yorick Wilks. 2006. Making sense about sense. In Word sense disambiguation, pages 47–73. Springer.

Adam Kilgarriff. 2006. Word senses. In Eneko Agirre and Philip Edmonds, editors, Word Sense Disambiguation, pages 29–46. Springer.

H´ector Mart´inez Alonso, Anders Johannsen, Anders Søgaard, Sussi Olsen, Anna Braasch, Sanni Nimb, Nicolai Hartvig Sørensen, and Bolette Sandford Pedersen. 2015a. Supersense tagging for danish. In Nodalida.

Héctor Martínez Alonso, Barbara Plank, Anders Johannsen, and Søgaard. 2015b. Active learning for sense annotation. In Nodalida.

Bolette Sandford Pedersen, Sanni Nimb, Jørg Asmussen, Nicolai Hartvig Sørensen, Lars Trap-Jensen, and Henrik Lorentzen. 2009. Dannet: the challenge of compiling a wordnet for danish by reusing a monolingual dictionary. Language resources and evaluation, 43(3):269–299.

Bolette Pedersen, Anna Braasch, Sanni Nimb, and Sussi Olsen. 2015. Betydningsinventar - i ordbøger og i løbende tekst, forthcoming. In Presentation at the 13th Conference on Lexicography in the Nordic Countries.

Piek Vossen. 1998. EuroWordNet: A multilingual database with lexical semantic networks. Springer.

Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. Webanno: A flexible, web-based and visually supported system for distributed annotations. In ACL (Conference System Demonstrations), pages 1–6.

Citeringar i Crossref