Hamps Lilliehöök
Department of Computer and Information Science, Linköping University, Sweden
Magnus Merkel
Department of Computer and Information Science, Linköping University, Sweden
Download articlePublished in: Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19
Linköping Electronic Conference Proceedings 88:4, p. 21-35
NEALT Proceedings Series 19:4, p. 21-35
Published: 2013-05-17
ISBN: 978-91-7519-586-5
ISSN: 1650-3686 (print), 1650-3740 (online)
In this article we describe work on creating word clusters in two steps. First; a graph-based approach to semantic mirroring is used to create primary synonym clusters from a bilingual lexicon. Secondly; the data is represented by vectors in a large vector space and a resource of synonym clusters is then constructed by performing K-means centroid-based clustering on the vectors. We evaluate the results automatically against WordNet and evaluate a sample of word clusters manually. Prospects and applications of the approach are also discussed.
Bansal; M.; DeNero; J.; and Lin; D. (2012). Unsupervised translation sense clustering.
Bird; S.; Klein; E.; and Loper; E. (2009). Natural Language Processing with Python. O’Reilly Media.
Cicurel; L.; Bloehdorn; S.; and Cimiano; P. (2006). Clustering of polysemic words. In GfKl’06; pages 595–602.
Dyvik; H. (2004). Translations as semantic mirrors: From parallel corpus to wordnet. Language and Computers; 49(1):311–326.
Eldén; L.; Merkel; M.; Ahrenberg; L.; and Fagerlund; M. (2013). Computing semantic clusters by semantic mirroring and spectral graph partitioning. Manuscript; submitted for publication.
Fagerlund; M.; Merkel; M.; Eldén; L.; and Ahrenberg; L. (2010). Computing word senses by semantic mirroring and spectral graph partitioning. In Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing; pages 103–107.
Jones; E.; Oliphant; T.; Peterson; P.; et al. (2001). SciPy: Open source scientific tools for Python.
Jurafsky; D. and Martin; J. H. (2009). Speech and Language Processing. Pearson/Prentice Hall.
Miller; G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM; 38:39–41.
Norstedts (2000). Norstedts stora engelsk-svenska ordbok. Norstedts.
Pérez; F. and Granger; B. E. (2007). IPython: a System for Interactive Scientific Computing. Comput. Sci. Eng.; 9(3):21–29.
Witten; I. H.; Frank; E.; and Hall; M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (Third Edition). Morgan Kaufmann.