Conference article

Fully Delexicalized Contexts for Syntax-Based Word Embeddings

Jenna Kanerva
TurkuNLP Group, University of Turku, Graduate School (UTUGS), Turku, Finland

Sampo Pyysalo
Language Technology Lab DTAL, University of Cambridge, United Kingdom

Filip Ginter
TurkuNLP Group, University of Turku, Finland

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:11, p. 83-91

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Word embeddings induced from large amounts of unannotated text are a key resource for many NLP tasks. Several recent studies have proposed extensions of the basic distributional semantics approach where words form the context of other words, adding features from e.g. syntactic dependencies. In this study, we look in a different direction, exploring models that leave words out entirely, instead basing the context representation exclusively on syntactic and morphological features. Remarkably, we find that the resulting vectors still capture clear semantic aspects of words in addition to syntactic ones. We assess the properties of the vectors using both intrinsic and extrinsic evaluations, demonstrating in a multilingual parsing experiment using 55 treebanks that fully delexicalized syntax-based word representations give a higher average parsing performance than conventional word2vec embeddings.

Keywords

No keywords available

References

No references available

Citations in Crossref