Supersense tagging for Danish

Héctor Martínez Alonso
University of Copenhagen, Denmark

Anders Johannsen
University of Copenhagen, Denmark

Sussi Olsen
University of Copenhagen, Denmark

Sanni Nimb
Danish Society of Language and Literature, Christians Brygge 1, Copenhagen, Denmark

Nicolai Hartvig Sørensen
Danish Society of Language and Literature, Christians Brygge 1, Copenhagen, Denmark

Anna Braasch
University of Copenhagen, Denmark

Anders Søgaard
University of Copenhagen, Denmark

Bolette Sandford Pedersen
University of Copenhagen, Denmark

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:6, s. 21-29

NEALT Proceedings Series 23:6, s. 21-29

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)


We describe the creation of a new Danish resource for automated coarse-grained word sense disambiguation of running text (supersense tagging, SST). Based on corpus evidence we expand the sense inventory to incorporate new lexical classes. Also, we add additional tags for verbal satellites like collocates, particles and reflexive pronouns, to give account for the satellite-framing properties of Danish. Finally, we evaluate the quality of our expanded sense inventory in terms of variation in $F_1$ on a state-of-the-art SST system. The initial release is a 1,500-sentence corpus covering six genres , made available under an open-source license.


