Conference article

Enriching a wordnet from a thesaurus

Sanni Nimb
Society for Danish Language and Literature, Denmark

Bolette S. Pedersen
University of Copenhagen, Denmark

Anna Braasch
University of Copenhagen, Denmark

Nicolai Sørensen
Society for Danish Language and Literature, Denmark

Thomas Troelsgård
Society for Danish Language and Literature, Denmark

Download article

Published in: Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19

Linköping Electronic Conference Proceedings 88:5, p. 36-50

NEALT Proceedings Series 19:5, p. 36-50

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-586-5

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Wordnets are traditionally built around synonym sets with the vertical hyponymy relations as the central structuring principle. The hyponymy relation; however; does not necessarily group concepts into synsets that are particularly close from a thematic or functional point of view; a phenomenon which is sometimes referred to as the “ISA overload”; or if contemplated from a thematic view point: the “tennis problem”. In this paper we present two experiments. The first one concerns a method for remedying these problems by transferring thematic information from a thesaurus to a wordnet (Danish Thesaurus to DanNet). Hereby we can automatically subdivide co-hyponyms thematically as well as relate synsets thematically across parts of speech. Since the thesaurus is not yet fully completed; the paper describes work in progress; nevertheless; with an error rate below 5% of the most coarse-grained transferred themes; the experiment appears to be very promising. Finally; the second experiment concerns extension of DanNet via the Danish Thesaurus: The thematic organisation of the thesaurus in near synonyms is further applied as a very precise method for automatically extending the lexical coverage of DanNet.

Keywords

Wordnet; “ tennis problem”; ISA overload; thesaurus; thematic information

References

Amaro; Raquel; Sara Mendes & Palmira Marrafa (2010). Encoding Event and Argument Structures in Wordnets. TSD 2010; LNAI 6231; 21–28. Berlin Heidelberg: Springer-Verlag. DOI:10.1007/978-3-642-15760-8.

Baccianella; Stefano; Andrea Esuli & Fabrizio Sebastiano (2010). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of 7th LREC - Language Resources and Evaluation. Paris: ELRA (European Language Resources Association). http://www.lrec-conf.org/proceedings/lrec2010/index.html.

Bilgin; Orhan; Özlem Cetinoglu & Kemal Oflazer (2004). Building a Wordnet for Turkish. Romanian Journal of Information; Science and Technology; 7 (1-2); 163-172. Bucarest: Editura Academiei Române.

Black; William; Sabri Elkateb; Horacio Rodriguez; Musa Alkhalifa; Piek Vossen; Adam Pease; Christiane Fellbaum (2006). Introducing the Arabic Word Net Project. Petr Sojka; Key-Sun Choi; Chritiane Fellbaum; Piek Vossen (Eds.) Proceedings of the third International WordNet Conference (GWC-06). Brno: Masaryk University. http://NLPweb.kaist.ac.kr/gwc/ pdf2006/74.pdf

Braasch; A. & B.S. Pedersen (2010). Encoding Attitude and Connotation in Wordnets . In: The 14th EURALEX International Congress; Leeuwarden ; The Netherlands.

Fellbaum; Christiane (ed) (1998). WordNet – An Electronic Lexical Database. Cambridge; Massachusetts; London; England: The MIT Press.

Fellbaum; Christiane; Georg A. Miller (2006). Whither Wordnets? Zampolli Prize Presentation at LREC 2006; Genova. http://www.lrecconf.org/lrec2006/IMG/pdf/ AZPrize.Christiane%20 Fellbaum%20Presentation.LREC06.pdf.

Fellbaum; Christiane & Piek Vossen (2008). Challenges for a Global WordNet. Online Proceedings of the First International Workshop on Global Interoperability for Language Resources (ICGL 2008); 75-82. Hongkong: City University of Hongkong. http://icgl.ctl.cityu.edu.hk/2008/html/resources/~proceeding_conference.pdf.

Gonzalo; Julio; Felisa Verdejo; Carol Peters & Nicoletta Calzolari (1998). Applying EuroWordNet to Cross-Language Retrieval. Computers and the Humanities. 32 (2/3); 185-207. The Netherlands: Kluwer Academic Publishers.

Guarino; Nicola (1998). Some Ontological Principles for Designing Upper Level Lexical Resources. Proceedings from the First International Conference on Language Resources and Evaluation; 527–534. Granada.

Guarino; Nicola & Chris Welty (2002). Identity and Subsumption. Green; R.; Bean; C.A. & Myaeng; S. H. (Eds.); The Semantics of Relationships: An Interdisciplinary Perspective; Information Science and Knowledge Management. Springer Verlag.

Hjorth; Ebba & Kjeld Kristensen (eds.) (2005). Den Danske Ordbog. Copenhagen: Gyldendal & Det Danske Sprog- og Litteraturselskab. Online version: http://ordnet.dk/ddo.

Huang; Chu-Ren.; I-Li Su; Pei-Yi Hsiao; Xiu-Ling Ke (2008). Paranymy: Enriching Ontological Knowledge in WordNets. Proceedings of the Fourth Global WordNet Conference; 221–228. Szeged; Hungary: Juhász Press Ltd.

Kokkinakis; Dimitrios; Maria Toporowska Gronostaj; Karin Warmenius (2000). Annotating; Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon. Proceeding LREC 2000; 1397-1403. Paris; France: ELRA

Kuti; Judit; Károly Varasdi; Ágnes Gyarmati; & Péter Vajda (2008). Language Independent and Language Dependent Innovations in the Hungarian WordNet. Proceedings of the Fourth Global WordNet Conference. 254-268. Szeged; Hungary: Juhász Press Ltd.

Madsen; Bodil Nistrup; Hanne Erdman Thomsen; & Carl Vikner (2004). Comparison of Principles Applying to Domain-Specific versus General Ontologies. Ontolex 2004; 90-95. Paris; France: ELRA.

Madsen; Bodil Nistrup & Hanne Erdman Thomsen (2009). Ontologies vs. Classification Systems. Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics; Lexicography; Terminology and Formal Ontologies. NEALT Proceedings Series 7; 27-32. Tartu: Northern European Association for Language Technology (NEALT) and Tartu University. http://dspace.utlib.ee/dspace/handle/10062/9840.

Mandala; Rila; Takenobu Tokunaga; & Hozumi Tanaka (1998). The use of WordNet in Information Retrieval. Proceedings of the COLING-ACL workshop on Usage of Wordnet in Natural Language Processing; 31– 37. Montreal; Canada: ACL / Morgan Kaufmann Publishers.

Montoyo; Andrés; Manuel Palomar and German Rigau (2001). Method for WordNet Enrichment using WSD. Matousek; V. ;P. Mautner R. Moucek and Karel Tauser (eds.) Proceeding TSD 2001 Lecture Notes in Computer Science;Volume 2166 ; 180-186. Springer.

Navigli; Roberto & Simone Paolo Ponzetto (2010). BabelNet: Building a Very Large Multilingual Semantic Network. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics; 216-225. Uppsala; Sweden. Association for Computational Linguistics.

Navigli; Roberto & Paola Velardi (2002). Automatic Adaptation of Wordnet to Domains Proceedings of the Third International Conference on Language Resources and Evaluation (LREC); 1499-1504. Paris; France: ELRA.

Navigli; Roberto; Paola Velardi; Alessandro Cucchiarelli & Francesca Neri (2004). Extending and Enriching WordNet with OntoLearn. Proceedings of The Second Global Wordnet Conference - GWC 2004. Brno: Masaryk University. http://www.dsi.uniroma1.it/ ~navigli/pubs/GCW_2004_Navigli_al.pdf.

Nimb; S. & B.S. Pedersen (2012). Towards a richer wordnet representation of properties – exploiting semantic and thematic information from thesauri. In: LREC 2012 Proceedings pp. 3452-3456. Istanbul; Turkey.

Pala; Karel & Dana Hlavácková: Derivational Relations in Czech WordNet (2007). ACL ‘07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies. Stroudsburg; PA; USA: Association for Computational Linguistics. http://portal.acm.org/citation.cfm?id=1567559.

Pedersen; Bolette.S. & Patrizia Paggio (2004). The Danish SIMPLE Lexicon and its Application in Content-based Querying. Nordic Journal of Linguistics 27 (1); 97-127. Cambridge University Press.

Pedersen; Bolette S; Sanni Nimb; Jørg Asmussen; Nicolai Sørensen; Lars Trap-Jensen & Henrik Lorentzen (2009). DanNet: The challenge of compiling a WordNet for Danish by reusing a monolingual dictionary. Language Resources and Evaluation; Computational Linguistics Series 43 (3); 269-299; doi:10.1007/s10579-009-9092-1.

Pedersen; B.S. & A. Braasch (2009). What do we need to know about humans? A view into the DanNet Database. In: K. Jokinen and E. Bick (eds.) Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. NEALT Proceedings Series; Vol. 4; Odense; Denmark.

Pianta; Emanuele; Luisa Bentivogli & Christian Girard (2002). MultiWordNet – Developing an aligned multilingual database. Proceedings of the First International Conference on Global WordNet; 293-302. Mysore; India.

Piasecki; Maciej; Stanislaw Szpakovicz & Bartosz Broda (2010). Toward plWordNet 2.0. Proceedings of the 5th International Conference on Global Wordnet (GWC2010); 263-270. Mumbai: Narosa Publishers.

Ruiz-Casado; Maria; Enrique Alfonesca & Pablo Castells (2005). Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets. Piotr S. Szczepaniak; Janusz Kacprzyk; Adam Niewiadomski (Eds.): Advances in Web Intelligence Third International Atlantic Web IntelligenceConference; AWIC 2005; Lodz; Poland; Proceedings. Lecture Notes in Computer Science 3528. Springer

Sampson; Geoffrey (2000). Review of WordNet: An Electronic Lexical Database. In International J. of Lexicography 13.54–9; 2000. Veale; Tony (2006). Tracking the Lexical Zeitgeist with WordNet and Wikipedia. Proceedings of the 17th European Conference on Artificial Intelligence (ECAI 2006); IOS Press; 56-60. Amsterdam; The Netherlands

Veale; Tony & Yanfen Hao (2008). Enriching WordNet with Folk Knowledge and Stereotypes. Proceedings of the Fourth Global WordNet Conference; 453-461. Szeged; Hungary: Juhász Press Ltd.

Veale; Tony & Cristina Butnariu (2010). Harvesting and understanding on-line neologisms. Alexander Onysko; Sascha Michel (eds.) Cognitive Perspectives on Word Formation. 399-420. De Gruyter Mouton.

Veale; Tony & Mourad el Moueddeb (2010). Similarity; Comparability and Analogy in WordNet: Squaring the Analogical Circle with Mondrian. Proceedings of the 5th International Conference on Global Wordnet (GWC2010). Mumbai: Narosa Publishers. tp://afflatus.ucd.ie/Papers/Mondrian%20GWC%20paper.pdf

Voorhees; E.M. (1993). Using wordnet to disambiguate word senses for text retrieval. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval; 171-180. New York; NY; USA: ACM.

Voorhees; Ellen M. (1994). Query expansion using lexical-semantic relations. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval; 61-69. New York: Springer-Verlag New York; Inc.

Voorhees; Ellen M. & Donna Harman (1997). Overview of the fifth text retrieval conference (trec-5). Proceedings of the Fifth Text Retrieval Conference; 1-28. NIST Special Publication 500- 238. Gaithersburg: NIST. http://trec.nist.gov/pubs/trec5/t5_proceedings.html

Vossen; Piek; Eneko Agirre; Nicoletta Calzolari; Christiane Fellbaum; Shu-Kai Hsieh; Chu-Ren Huang; Hitoshi Isahara; Kyoko Kanzaki; Andrea Marchetti; Monica Monachini; Feririco Neri; Remo Raffaelli; German Rigau; Maurisio Tesconi & Joop CanGent (2008). KYOTO: A System for Mining; Structuring and Distributing Knowledge Across Language and Culture. Proceedings of the Fourth Global WordNet Conference; 474-484. Szeged; Hungary: Juhász Press Ltd.

Vossen; Piek (ed.) (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers.

Citations in Crossref