Coarse-grained sense annotation of Danish across textual domains

Sussi Olsen
University of Copenhagen, Copenhagen, Denmark

Bolette S. Pedersen
University of Copenhagen, Copenhagen, Denmark

Héctor Martínez Alonso
University of Copenhagen, Copenhagen, Denmark

Anders Johannsen
University of Copenhagen, Copenhagen, Denmark

Ingår i: Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping Electronic Conference Proceedings 112:6, s. 36–43

NEALT Proceedings Series 27:6, s. 36–43

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-049-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present the results of a coarse-grained sense annotation task on verbs, nouns and adjectives across six textual domains in Danish. We present the domain-wise differences in intercoder agreement and discuss how the applicability and validity of the sense inventory vary depending on domain. We find that domain-wise agreement is not higher in very canonical or edited text. In fact, newswire text and parliament speeches have lower agreement than blogs and chats, probably because the language of these text types is more complex and uses more abstract concepts. We further observe that domains differ in their sense distribution. For instance, newswire and magazines stand out as having a high focus on persons, and discussion fora typically include a restricted number of senses dependent on specialized topics. We anticipate that these findings can be exploited in automatic sense tagging when dealing with domain shift.


sense annotation; sense tagging; sense inventory; supersenses; Danish; textual domains


