Utilizing Pre-Trained Word Embeddings to Learn Classification Lexicons with Little Supervision

Frederick Blumenthal
d-fine GmbH, Germany

Ferdinand Graf
d-fine GmbH, Germany

Ladda ner artikel

Ingår i: Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 165:2, s. 5-15

NEALT Proceedings Series 40:2, p. 5-15

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-997-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


A lot of the decision making in financial institutions, regarding particularly investments and risk management, is data-driven. An important task to effectively gain insights from unstructured text documents is text classification and in particular sentiment analysis. Sentiment lexicons, i.e. lists of words with corresponding sentiment orientations, are a very valuable resource to build strong baseline models for sentiment analysis that are easy to interpret and computationally efficient. We present a novel method to learn classification lexicons from a labeled text corpus that incorporates word similarities in the form of pre-trained word embeddings. We show on two sentiment analysis tasks that utilizing pre-trained word embeddings improves the accuracy over the baseline method. The accuracy improvement is particularly large when labeled data is scarce, which is often the case in the financial domain. Moreover, the new method can be used to generate sensible sentiment scores for words outside the labeled training corpus.


Document Classification, Sentiment Analysis, Sentiment Lexicons, Dictionary Generation, Word Embeddings


Inga referenser tillgängliga

Citeringar i Crossref