Clustering word senses from semantic mirroring data

Hamps Lilliehöök
Department of Computer and Information Science, Linköping University, Sweden

Magnus Merkel
Department of Computer and Information Science, Linköping University, Sweden

Ladda ner artikel

Ingår i: Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19

Linköping Electronic Conference Proceedings 88:4, s. 21-35

NEALT Proceedings Series 19:4, s. 21-35

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-586-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this article we describe work on creating word clusters in two steps. First; a graph-based approach to semantic mirroring is used to create primary synonym clusters from a bilingual lexicon. Secondly; the data is represented by vectors in a large vector space and a resource of synonym clusters is then constructed by performing K-means centroid-based clustering on the vectors. We evaluate the results automatically against WordNet and evaluate a sample of word clusters manually. Prospects and applications of the approach are also discussed.


Word senses; clustering; semantic mirroring


Bansal; M.; DeNero; J.; and Lin; D. (2012). Unsupervised translation sense clustering.

Bird; S.; Klein; E.; and Loper; E. (2009). Natural Language Processing with Python. O’Reilly Media.

Cicurel; L.; Bloehdorn; S.; and Cimiano; P. (2006). Clustering of polysemic words. In GfKl’06; pages 595–602.

Dyvik; H. (2004). Translations as semantic mirrors: From parallel corpus to wordnet. Language and Computers; 49(1):311–326.

Eldén; L.; Merkel; M.; Ahrenberg; L.; and Fagerlund; M. (2013). Computing semantic clusters by semantic mirroring and spectral graph partitioning. Manuscript; submitted for publication.

Fagerlund; M.; Merkel; M.; Eldén; L.; and Ahrenberg; L. (2010). Computing word senses by semantic mirroring and spectral graph partitioning. In Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing; pages 103–107.

Jones; E.; Oliphant; T.; Peterson; P.; et al. (2001). SciPy: Open source scientific tools for Python.

Jurafsky; D. and Martin; J. H. (2009). Speech and Language Processing. Pearson/Prentice Hall.

Miller; G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM; 38:39–41.

Norstedts (2000). Norstedts stora engelsk-svenska ordbok. Norstedts.

Pérez; F. and Granger; B. E. (2007). IPython: a System for Interactive Scientific Computing. Comput. Sci. Eng.; 9(3):21–29.

Witten; I. H.; Frank; E.; and Hall; M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (Third Edition). Morgan Kaufmann.

Citeringar i Crossref