Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

Flavio Massimiliano Cecchini
DISCo, Universit`a degli Studi di Milano-Bicocca, Italy

Martin Riedl
Language Technology Group, Universität Hamburg, Germany

Chris Biemann
Language Technology Group, Universität Hamburg, Germany

Ladda ner artikel

Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:13, s. 105-114

NEALT Proceedings Series 29:13, s. 105-114

Visa mer +

Publicerad: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.


Inga nyckelord är tillgängliga


Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information retrieval, 12(4):461–486.

Amit Bagga and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In Proceedings of the first international Conference on Language Resources and Evaluation (LREC’98), workshop on linguistic coreference, pages 563–566, Granada, Spain. European Language Resources Association.

Osman Bas¸kaya and David Jurgens. 2016. Semisupervised learning with induced word senses for state of the art word sense disambiguation. Journal of Artificial Intelligence Research, 55:1025–1058.

Chris Biemann and Uwe Quasthoff. 2009. Networks generated from natural language text. In Dynamics on and of complex networks, pages 167–185. Springer.

Chris Biemann and Martin Riedl. 2013. Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1):55–95.

Chris Biemann. 2006. Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, pages 73–80, New York, New York, USA.

Stefan Bordag. 2006. Word sense induction: Tripletbased clustering and automatic evaluation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 137–144, Trento, Italy. EACL.

Flavio Massimiliano Cecchini and Elisabetta Fersini. 2015. Word sense discrimination: A gangplank algorithm. In Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015, pages 77–81, Trento, Italy.

Marie-Catherine De Marneffe, Bill MacCartney, and Christopher Manning. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the fifth international Conference on Language Resources and Evaluation (LREC’06), pages 449–454, Genoa, Italy. European Language Resources Association.

Ferdinand De Saussure. 1995 [1916]. Cours de linguistique générale. Payot&Rivage, Paris, France. Critical edition of 1st 1916 edition.

Stefan Evert. 2004. The statistics of word cooccurrences: word pairs and collocations. Ph.D. thesis, Universit¨at Stuttgart, August.

William Gale, Kenneth Church, and David Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In Technical Report of 1992 Fall Symposium - Probabilistic Approaches to Natural Language, pages 54–60, Cambridge, Massachusetts, USA. AAAI.

Reza Ghaemi, Md Nasir Sulaiman, Hamidah Ibrahim, Norwati Mustapha, et al. 2009. A survey: clustering ensembles techniques. World Academy of Science, Engineering and Technology, 50:636–645.

Zellig Harris. 1954. Distributional structure. Word, 10(2-3):146–162.

Taher Haveliwala. 2002. Topic-sensitive pagerank. In Proceedings of the 11th international conference on World Wide Web, pages 517–526, Honolulu, Hawaii, USA. ACM.

David Hope and Bill Keller. 2013. MaxMax: a graphbased soft clustering algorithm applied to word sense induction. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing, pages 368–381, Samos, Greece.

David Jurgens and Ioannis Klapaftis. 2013. SemEval-2013 task 13: Word sense induction for graded and non-graded senses. In *SEM 2013: The Second Joint Conference on Lexical and Computational Semantics, volume 2, pages 290–299, Atlanta, Georgia, USA. ACL.

Adam Kilgarriff, Pavel Rychlý, Pavel Smrž, and David Tugwell. 2004. The sketch engine. In Proceedings of the Eleventh Euralex Conference, pages 105–116, Lorient, France.

Linlin Li, Ivan Titov, and Caroline Sporleder. 2014. Improved estimation of entropy for evaluation of word sense induction. Computational Linguistics, 40(3):671–685.

Suresh Manandhar, Ioannis Klapaftis, Dmitriy Dligach, and Sameer Pradhan. 2010. Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation, pages 63–68, Los Angeles, California, USA. Association for Computational Linguistics.

James Martin and Daniel Jurafsky. 2000. Speech and language processing. Pearson, Upper Saddle River, New Jersey, USA.

George Miller. 1995. WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41.

Preslav Nakov and Marti Hearst. 2003. Categorybased pseudowords. In Companion Volume of the Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HTLNAACL) 2003 - Short Papers, pages 70–72, Edmonton, Alberta, Canada. Association for Computational Linguistics.

Roberto Navigli, Kenneth Litkowski, and Orin Hargraves. 2007. SemEval-2007 task 07: Coarsegrained English all-words task. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 30–35, Prague, Czech Republic. Association for Computational Linguistics.

Roberto Navigli. 2009. Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2):10.

Lubomír Otrusina and Pavel Smrž. 2010. A new approach to pseudoword generation. In Proceedings of the seventh international Conference on Language Resources and Evaluation (LREC’10), pages 1195–1199. European Language Resources Association.

Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2011. English Gigaword Fifth Edition. Linguistic Data Consortium, Philadelphia, Pennsylvania, USA.

Mohammad Taher Pilehvar and Roberto Navigli. 2013. Paving the way to a large-scale pseudosenseannotated dataset. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HTL-NAACL), pages 1100–1109, Atlanta, Georgia, USA. Association for Computational Linguistics.

Steffen Remus and Chris Biemann. 2013. Three knowledge-free methods for automatic lexical chain extraction. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HTL-NAACL), pages 989–999, Atlanta, Georgia, USA. Association for Computational Linguistics.

Matthias Richter, Uwe Quasthoff, Erla Hallsteinsdóttir, and Chris Biemann. 2006. Exploiting the Leipzig Corpora Collection. In Proceedings of the Fifth Slovenian and First International Language Technologies Conference, IS-LTC ’06, pages 68–73, Ljubljana, Slovenia. Slovenian Language Technologies Society.

Keijo Ruohonen. 2013. Graph Theory. Tampereen teknillinen yliopisto. Originally titled Graafiteoria, lecture notes translated by Tamminen, J., Lee, K.-C. and Piché, R.

Hinrich Schütze. 1992. Dimensions of meaning. In Proceedings of Supercomputing’92, pages 787–796, Minneapolis, Minnesota, USA. ACM/IEEE.

Hinrich Schütze. 1998. Automatic word sense discrimination. Computational linguistics, 24(1):97–123.

Alexander Strehl. 2002. Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. thesis, The University of Texas at Austin, May.

Peter Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141–188.

Stijn van Dongen. 2000. Graph clustering by flow simulation. Ph.D. thesis, Universiteit Utrecht, May.

Dominic Widdows and Beate Dorow. 2002. A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on Computational Linguistics, volume 1, pages 1–7, Taipei, Taiwan. Association for Computational Linguistics

Citeringar i Crossref