Alexander Popov
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria
Download article![](/images/PDF_24.png)
Published in: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018
Linköping Electronic Conference Proceedings 159:16, p. 152-165
Published: 2019-05-28
ISBN: 978-91-7685-034-3
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper describes a multi-component research project on the computational lexicon, the results of which will be used and built upon in work within the CLARIN infrastructure to be developed by the Bulgarian national consortium. PrincetonWordNet is used as the primary lexicographic resource for producing machine-oriented models of meaning. Its dictionary and semantic network are used to build knowledge graphs, which are then enriched with additional semantic and syntactic relations extracted from various other sources. Experimental results demonstrate that this enrichment leads to more accurate lexical analysis. The same graph models are used to create distributed semantic models (or ”embeddings”), which perform very competitively on standard word similarity and relatedness tasks. The paper discusses how such vector models of the lexicon can be used as input features to neural network systems for word sense disambiguation. Several neural architectures are discussed, including two multi-task architectures, which are trained to reflect more accurately the polyvalent nature of lexical items. Thus, the paper provides a faceted view of the computational lexicon, in which separate aspects of it are modeled in different ways, relying on different theoretical and data sources, and are used to different purposes.
Lexical modeling,
WordNet,
Word sense disambiguation,
Neural networks,
Word embeddings,
Knowledge graphs