Here be dragons? The perils and promises of inter-resource lexical-semantic mapping

Lars Borin
Språkbanken, Department of Swedish, University of Gothenburg, Sweden

Richard Johansson
Språkbanken, Department of Swedish, University of Gothenburg, Sweden

Luis Nieto Piña
Språkbanken, Department of Swedish, University of Gothenburg, Sweden

Ladda ner artikel

Ingår i: Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping Electronic Conference Proceedings 112:2, s. 1–11

NEALT Proceedings Series 27:2, s. 1–11

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-049-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Lexical-semantic knowledges sources are a stock item in the language technologist’s toolbox, having proved their practical worth in many and diverse natural language processing (NLP) applications. In linguistics, lexical semantics comes in many flavors, but in the NLP world, wordnets reign more or less supreme. There has been some promising work utilizing Roget-style thesauruses instead, but wider experimentation is hampered by the limited availability of such resources. The work presented here is a first step in the direction of creating a freely available Roget-style lexical resource for modern Swedish. Here, we explore methods for automatic disambiguation of interresource mappings with the longer-term goal of utilizing similar techniques for automatic enrichment of lexical-semantic resources.


thesaurus; word sense disambiguation; inter-resource mapping; corpus-based word semantics; lexicon based word semantics; SALDO; Roget


Lars Borin and Markus Forsberg. 2009. All in the family: A comparison of SALDO andWordNet. In Proceedings of the Nodalida 2009 Workshop on Word-Nets and other Lexical Semantic Resources, Odense.

Lars Borin, Markus Forsberg, and Lennart Lönngren. 2013. SALDO: a touch of yin to WordNet’s yang. Language Resources and Evaluation, 47(4):1191–1211.

Lars Borin, Jens Allwood, and Gerard de Melo. 2014. Bring vs. MTRoget: Evaluating automatic thesaurus translation. In Proceedings of LREC 2014, pages 2115–2121, Reykjav´ik. ELRA.

Lars Borin. 2012. Core vocabulary: A useful but mystical concept in some kinds of linguistics. In Diana Santos, Krister Lind´en, and Wanjiku Ng’ang’a, editors, Shall we play the Festschrift game? Essays on the occasion of Lauri Carlson’s 60th birthday, pages 53–65. Springer, Berlin.

Patrick Cassidy. 2000. An investigation of the semantic relations in the Roget’s Thesaurus: Preliminary results. In Proceedings of CICLing 2000, pages 181–204.

D. Alan Cruse. 1986. Lexical semantics. Cambridge University Press, Cambridge.

Gerard de Melo and Gerhard Weikum. 2008. Mapping Roget’s Thesaurus and WordNet to French. In Proceedings of LREC 2008, Marrakech. ELRA.

Gerard de Melo and Gerhard Weikum. 2009. Towards a universal wordnet by learning from combined evidence. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), pages 513–522, New York. ACM.

Katrin Erk. 2010. What is word meaning, really? (And how can distributional models help us describe it?). In Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics, pages 17–26, Uppsala. ACL.

Christiane Fellbaum. 1998a. Introduction. In Christiane Fellbaum, editor, WordNet: An electronic lexical database, pages 1–19. MIT Press, Cambridge, Mass.

Christiane Fellbaum, editor. 1998b. WordNet: An electronic lexical database. MIT Press, Cambridge, Mass.

Cliff Goddard. 2001. Lexico-semantic universals: A critical overview. Linguistic Typology, 5:1–65.

Patrick Hanks. 2000. Do word meanings exist? Computers and the Humanities, 34(1–2):205–215.

Patrick Hanks. 2013. Lexical analysis. Norms and exploitations. MIT Press, Cambridge, Massachusetts.

Zellig Harris. 1954. Distributional structure. Word, 10(23).

Werner H¨ullen. 2004. A history of Roget’s Thesaurus: Origins, development, and design. Oxford University Press, Oxford.

Mario Jarmasz and Stan Szpakowicz. 2001. The design and implementation of an electronic lexical knowledge base. In Proceedings the 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI 2001, pages 325–333.

Mario Jarmasz and Stan Szpakowicz. 2004. Roget’s Thesaurus and semantic similarity. In Nicolas Nicolov, Kalina Bontcheva, Galia Angelova, and Ruslan Mitkov, editors, Recent Advances in Natural Language Processing III. Selected papers from RANLP 2003, pages 111–120. John Benjamins, Amsterdam.

Amanda C. Jobbins and Lindsay J. Evett. 1995. Automatic identification of cohesion in texts: Exploiting the lexical organization of Roget’s Thesaurus. In Proceedings of Rocling VIII, pages 111–125, Taipei.

Amanda C. Jobbins and Lindsay J. Evett. 1998. Text segmentation using reiteration and collocation. In Proceedings of the 36th ACL and 17th COLING, Volume 1, pages 614–618, Montreal. ACL.

Richard Johansson and Luis Nieto Piña. 2015. Embedding a semantic network in a word space. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, United States. To appear.

Richard Johansson. 2014. Automatic expansion of the Swedish FrameNet lexicon. Constructions and Frames, 6(1):92–113.

Alistair Kennedy and Stan Szpakowicz. 2008. Evaluating Roget’s thesauri. In Proceedings of ACL-08: HLT, pages 416–424, Columbus, Ohio. ACL.

Alistair Kennedy and Stan Szpakowicz. 2014. Evaluation of automatic updates of Roget’s Thesaurus. Journal of Language Modelling, 2(2):1–49.

Adam Kilgarriff. 1997. I don’t believe in word senses. Computers and the Humanities, 31(2):91–113.

Oi Yee Kwong. 1998. Aligning WordNet with additional lexical resources. In Workshop on usage of WordNet in natural language processing systems at COLING-ACL’98, pages 73–79, Montr´eal. ACL.

Lennart L¨onngren. 1998. A Swedish associative thesaurus. In Euralex ’98 proceedings, Vol. 2, pages 467–474.

Tom´a?s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In International Conference on Learning Representations, Workshop Track, Scottsdale, USA.

Jane Morris and Graeme Hirst. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21–48.

M. Lynne Murphy. 2003. Semantic relations and the lexicon. Cambridge University Press, Cambridge.

Vivi Nastase and Stan Szpakowicz. 2001. Word-sense disambiguation in Roget’s Thesaurus using Word-Net. In Workshop on WordNet and other lexical resources at NAACL, Pittsburgh. ACL.

Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250.

Fabian Pedregosa, Ga¨el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Mark Peter Roget. 1852. Thesaurus of English Words and Phrases. Longman, London.

Peter D. Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141–188.

Martine Vanhove, editor. 2008. From polysemy to semantic change: Towards a typology of lexical semantic associations. Jon Benjamins, Amsterdam.

Yorick Wilks. 1998. Language processing and the thesaurus. In Proceedings National language Research Institute, Tokyo. Also appeared as Technical report CS–97–13, University of Sheffield, Department of Computer Science.

Zhibiao Wu and Martha Palmer. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 133–138, Las Cruces, New Mexico, USA.

Citeringar i Crossref