Conference article

Linear Ensembles of Word Embedding Models

Avo Muromägi
University of Tartu, Tartu, Estonia

Kairit Sirts
University of Tartu, Tartu, Estonia

Sven Laur
University of Tartu, Tartu, Estonia

Download article

Published in: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 68:12, p. 96-104

NEALT Proceedings Series 29:12, p. 96-104

Show more +

Published: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper explores linear methods for combining several word embedding models into an ensemble. We construct the combined models using an iterative method based on either ordinary least squares regression or the solution to the orthogonal Procrustes problem. We evaluate the proposed approaches on Estonian—a morphologically complex language, for which the available corpora for training word embeddings are relatively small. We compare both combined models with each other and with the input word embedding models using synonym and analogy tests. The results show that while using the ordinary least squares regression performs poorly in our experiments, using orthogonal Procrustes to combine several word embedding models into an ensemble model leads to 7-10% relative improvements over the mean result of the initial models in synonym tests and 19-47% in analogy tests.

Keywords

No keywords available

References

No references available

Citations in Crossref