Konferensartikel

Nordic and Baltic wordnets aligned and compared through “WordTies”

Pedersen S. Pedersen
University of Copenhagen, Copenhagen, Denmark

Lars Borin
University of Gothenburg, Gothenburg, Sweden

Markus Forsberg
University of Gothenburg, Gothenburg, Sweden

Neeme Kahusk
University of Tartu, Tartu, Estonia

Krister Lindén
University of Helsinki, Finland

Jyrki Niemi
University of Helsinki, Finland

Niklas Nisbeth
University of Copenhagen, Copenhagen, Denmark

Lars Nygaard
Kaldera Language Technology, Oslo, Norway

Heili Orav
University of Tartu, Tartu, Estonia

Hiríkur Rögnvaldsson
University of Iceland, Iceland

Mitchel Seaton
University of Copenhagen, Copenhagen, Denmark

Kadri Vider
University of Tartu, Tartu, Estonia

Kaarlo Voionmaa
University of Gothenburg, Gothenburg, Sweden

Ladda ner artikel

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:16, s. 147-162

NEALT Proceedings Series 16:16, s. 147-162

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

During the last few years; extensive wordnets have been built locally for the Nordic and Baltic languages applying very different compilation strategies. The aim of the present investigation is to consolidate and examine these wordnets through an alignment via Princeton Core WordNet and thereby compare them along the measures of taxonomical structure; synonym structure; and assigned relations to approximate to a best practice. A common web interface and visualizer “WordTies” is developed to facilitate this purpose. Four bilingual wordnets are automatically processed and evaluated exposing interesting differences between the wordnets. Even if the alignments are judged to be of a good quality; the precision of the translations vary due to considerable differences in hyponymy depth and interpretation of the synset. All seven monolingual and four bilingual wordnets as well as WordTies have been made available via META-SHARE through the META-NORD project.

Nyckelord

Wordnets; multilingual links; wordnet web interface; Nordic and Baltic languages; META-NORD.

Referenser

Rigau; G. and Agirre; E. (2002). Semi-automatic Methods for WordNet Construction. Tutorial at 2002 International WordNet Conference; Mysore; India.

Bhattacharyya; P. (2010) IndoWordNet. Proceedings of LREC 2010. Valletta: ELRA.

Borin; L.; Danélls; D.; Forsberg; M.; Kokkinakis; D. and Gronostaj; M.T. (2010). The past meets the present in Swedish FrameNet++. In Proceedings of the 14th EURALEX International Congress; pp. 269–281. Leeuwarden: EURALEX.

Borin; L. and Forsberg; M. (2009). All in the family: A comparison of SALDO and WordNet. In Proceedings of the Nodalida 2009 Workshop on WordNets and other Lexical Semantic Resources – between Lexical Semantics; Lexicography; Terminology and Formal Ontologies; pp. 7–12. Odense: NEALT.

Borin; L. and Forsberg; M. (2010). Beyond the synset: Swesaurus – a fuzzy Swedish wordnet. In Workshop on Re-thinking synonymy: Semantic sameness and similarity in languages and their description. Helsinki.

Borin; L. and Forsberg; M. (2011). Swesaurus – ett svenskt ordnät med fria tyglar. LexicoNordica vol. 18; pp. 17–39.

Borin; L.; Forsberg; M. and Lönngren; L. (2008). The hunting of the BLARK – SALDO; a freely available lexical database for Swedish language technology. Joakim Nivre; Mats Dahllöf and Beáta Megyesi (eds.); Resourceful language technology. Festschrift in honor of Anna Sågvall
Hein; pp. 21–32. Acta Universitatis Upsaliensis: Studia Linguistica Upsaliensia 7. Uppsala: Uppsala University.

Derwojedowa; M.; Piasecki; M.; Szpakowicz; S.; Zawislawska; M. and Broda; B. (2008). Words; concepts and relations in the construction of the polish WordNet. In Global WordNet Conference 2008; pp. 162–177. Szeged; Hungary.

Daudé J.; Padró L. and Rigau G. (2003). Validation and Tuning of Wordnet Mapping Techniques. Proceedings of the International Conference on Recent Advances on Natural Language Processing (RANLP’03). Borovets; Bulgaria.

Daudé J.; Padró L. and Rigau G. (1999). Mapping Multilingual Hierarchies Using Relaxation Labeling. Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC’99). Maryland; US.

Fellbaum; C. (ed) (1998). WordNet – An Electronic Lexical Database. Cambridge; Massachusetts: The MIT Press.

Hjorth; E. and Kristensen; K. (2003). Den Danske Ordbog. Gyldendal; Denmark.

Järborg; J. (2001). Roller i Semantisk databas. Research Reports from the Department of Swedish; No. GU-ISS-01-3. University of Gothenburg: Dept. of Swedish. Johannsen; A. and Pedersen; B.S. (2011). “Andre ord” – a wordnet browser for the Danish wordnet; DanNet. In Proceedings from 18th Nordic Conference of Computational Linguistics; NODALIDA 2011; Riga; Latvia. Nothern Association for Language Technology; Vol. 11 pp. 295–298; University of Tartu.

Kann; V. and Rosell; M. (2006). Free construction of a free Swedish dictionary of synonyms In Proceedings of the 15th NODALIDA conference; pp. 105–110. Joensuu: University of Eastern Finland.

Martola; N. (2011). FinnWordNet och det finska samhället. In: Symposium om onomasiologiske ordbøker i Norden. Schæffergården; Copenhagen.

Kahusk; N.; Orav; H. and Vare; K. (2012). Cross-linking Experience of Estonian WordNet. In: Human Language Technologies – The Baltic Perspective: The Fifth International Conference on Human Language Technologies – The Baltic perspective. Tartu; Estonia; October 4-5; 2012. (Ed. Arvi; Tavast; Kadri Muischnek; Mare; Koit). IOS Press; pp. 96–102. Online access: doi:10.3233/978-1-61499-133-5-96

Lenci; A.; Bel; N.; Busa; F.; Calzolari; N.; Gola; E.; Monachini; M.; Ogonowski; A.; Peters; I.; Peters; W.; Ruimy; N.; Villegas; M. and Zampolli; A. (2000). SIMPLE: A general framework for the development of multilingual lexicons. International Journal of Lexicography; vol. 13; pp. 249–263

Lindén; K. and Carlson; L. (2010). FinnWordNet – WordNet på finska via översättning. LexicoNordica – Nordic Journal of Lexicography; vol. 17; pp. 119–140

Lindén; K.; Niemi; J. and Hyvärinen; M. (2012) Extending and Updating the Finnish Wordnet. In Diana Santos; Krister Lindén and Wanjiku Ng’ang’a (eds.); Shall We Play the Festschrift Game? Essays on the Occasion of Lauri Carlson’s 60th Birthday; pp. 67–98. Springer: Berlin; Heidelberg. ISBN 978-3-642-30773-7.

Pedersen; B.S; Nimb; S.; Asmussen; J.; Sørensen; N.; Trap-Jensen; L. and Lorentzen; H. (2009). DanNet – the challenge of compiling a WordNet for Danish by reusing a monolingual dictionary. Language Resources and Evaluation; Computational Linguistics Series; pp. 269– 299.

Pedersen; B.S.; Nimb; S. and Braasch; A. (2010). Merging specialist taxonomies and folk taxonomies in wordnets. - a case study of plants; animals and foods in the Danish wordnet In: Proceedings from the Seventh International Conference on Language Resources and Evaluation; pp. 3181–3186. Malta.

Peters; W.; Vossen; P.; Díes-Orzas; P. and Adriaens; G. (1998). Cross-lingual Alignment of Wordnets with an Inter-Lingual-Index. In: EuroWordNet – A Multilingual Database with Lexical Semantic Networks; pp. 149–179. Kluwer Academic Publishers.

Pustejovsky; J. (1995). The Generative Lexicon. Cambridge; Massachusetts: MIT Press.

Robkop; K.; Thoongsup; S.; Charoenpron; T.; Sornlertlamvanich; V. and Isahara; H.. (2010).WNMS: Connecting Distributed Wordnet in the Case of Asian WordNet. In: Proceedings of the 5th International Conference of the Global WordNet Association (GWC 2010); Mumbai; India.

Tufis; D.; Ion; R. and Ide; N. (2004). Word Sense Disambiguation as a Wordnets Validation Method in BalkaNet. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004); pp. 1071–1074. Lisbon: ELRA

Vossen; P. (ed.) (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers.

Citeringar i Crossref