Conference article

Supersense tagging for Danish

Héctor Martínez Alonso
University of Copenhagen, Denmark

Anders Johannsen
University of Copenhagen, Denmark

Sussi Olsen
University of Copenhagen, Denmark

Sanni Nimb
Danish Society of Language and Literature, Christians Brygge 1, Copenhagen, Denmark

Nicolai Hartvig Sørensen
Danish Society of Language and Literature, Christians Brygge 1, Copenhagen, Denmark

Anna Braasch
University of Copenhagen, Denmark

Anders Søgaard
University of Copenhagen, Denmark

Bolette Sandford Pedersen
University of Copenhagen, Denmark

Download article

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:6, s. 21-29

NEALT Proceedings Series 23:6, s. 21-29

Show more +

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We describe the creation of a new Danish resource for automated coarse-grained word sense disambiguation of running text (supersense tagging, SST). Based on corpus evidence we expand the sense inventory to incorporate new lexical classes. Also, we add additional tags for verbal satellites like collocates, particles and reflexive pronouns, to give account for the satellite-framing properties of Danish. Finally, we evaluate the quality of our expanded sense inventory in terms of variation in $F_1$ on a state-of-the-art SST system. The initial release is a 1,500-sentence corpus covering six genres , made available under an open-source license.

Keywords

No keywords available

References

Jørg Asmussen and Jakob Halskov. 2012. The CLARIN DK Reference Corpus. In Sprogteknologisk Workshop.

Matthias Buch-Kromann, Line Mikkelsen, and Stine Kern Lynge. 2003. Danish dependency treebank. In TLT.

Massimiliano Ciaramita and Yasemin Altun. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proc. of EMNLP, pages 594–602, Sydney, Australia, July.

Hal Daumé, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine learning, 75(3):297–325.

Christiane Fellbaum. 1998. WordNet: an electronic lexical database. MIT Press USA.

Birgit Hamp and Helmut Feldweg. 1997. Germaneta lexical-semantic net for german. In Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, pages 9–15. Citeseer.

Anders Johannsen, Dirk Hovy, H´ector Martinez, Barbara Plank, and Anders Søgaard. 2014. More or less supervised supersense tagging of Twitter. In Lexical and Computational Semantics (*SEM 2014).

Upali Kohomban and Wee Lee. 2005. Learning semantic classes for word sense disambiguation. In ACL.

Upali Kohomban and Wee Lee. 2007. Optimizing classifier performance in word sense disambiguation by redefining word sense classes. In IJCAI.

George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. 1994. Using a semantic concordance for sense identification. In Proceedings of the workshop on Human Language Technology, pages 240–243. Association for Computational Linguistics.

Sussi Olsen, Bolette Sandford Pedersen, Héctor Mart´inez Alonso, and Anders Johannsen. 2015. Coarse-grained sense annotation of danish across textual domains. In COLING.

Bolette Sandford Pedersen, Sanni Nimb, Jørg Asmussen, Nicolai Hartvig Sørensen, Lars Trap-Jensen, and Henrik Lorentzen. 2009. Dannet: the challenge of compiling a wordnet for danish by reusing a monolingual dictionary. Language resources and evaluation, 43(3):269–299.

Wim Peters, Ivonne Peters, and Piek Vossen. 1998. Automatic sense clustering in eurowordnet. In LREC. Paris: ELRA.

Nathan Schneider, Behrang Mohit, Kemal Oflazer, and Noah A Smith. 2012. Coarse lexical semantic annotation with supersenses: an arabic case study. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 253–258. Association for Computational Linguistics.

Anders Søgaard, Barbara Plank, and Hector Martinez Alonso. 2015. Using frame semantics for knowledge extraction from twitter. In AAAI.

Leonard Talmy. 1985. Lexicalization patterns: Semantic structure in lexical forms. Language typology and syntactic description, 3:57–149.

Kristina Toutanova, Dan Klein, Chris Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL.

Stephen Tratz and Eduard Hovy. 2010. Isi: automatic classification of relations between nominals using a maximum entropy classifier. In Proceedings of the 5th InternationalWorkshop on Semantic Evaluation, pages 222–225. Association for Computational Linguistics.

Yulia Tsvetkov, Elena Mukomel, and Anatole Gershman. 2013. Cross-lingual metaphor detection using common semantic features. Meta4NLP 2013, page 45.

Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, and Chris Dyer. 2014. Augmenting english adjective senses with supersenses. In Proc. of LREC.

Patrick Ye and Timothy Baldwin. 2007. Melb-yb: Preposition sense disambiguation using rich semantic features. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 241–244. Association for Computational Linguistics.

Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. Webanno: A flexible, web-based and visually supported system for distributed annotations. In ACL (Conference System Demonstrations), pages 1–6.

Citations in Crossref