Conference article

Lexical Modeling for Natural Language Processing

Alexander Popov
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria

Download article

Published in: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:16, p. 152-165

Show more +

Published: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper describes a multi-component research project on the computational lexicon, the results of which will be used and built upon in work within the CLARIN infrastructure to be developed by the Bulgarian national consortium. PrincetonWordNet is used as the primary lexicographic resource for producing machine-oriented models of meaning. Its dictionary and semantic network are used to build knowledge graphs, which are then enriched with additional semantic and syntactic relations extracted from various other sources. Experimental results demonstrate that this enrichment leads to more accurate lexical analysis. The same graph models are used to create distributed semantic models (or ”embeddings”), which perform very competitively on standard word similarity and relatedness tasks. The paper discusses how such vector models of the lexicon can be used as input features to neural network systems for word sense disambiguation. Several neural architectures are discussed, including two multi-task architectures, which are trained to reflect more accurately the polyvalent nature of lexical items. Thus, the paper provides a faceted view of the computational lexicon, in which separate aspects of it are modeled in different ways, relying on different theoretical and data sources, and are used to different purposes.

Keywords

Lexical modeling, WordNet, Word sense disambiguation, Neural networks, Word embeddings, Knowledge graphs

References

No references available

Citations in Crossref