Konferensartikel

SenSALDO: a Swedish Sentiment Lexicon for the SWE-CLARIN Toolbox

Jacobo Rouces
Språkbanken Text, University of Gothenburg, Sweden

Lars Borin
Språkbanken Text, University of Gothenburg, Sweden

Nina Tahmasebi
Språkbanken Text, University of Gothenburg, Sweden

Stian Rødven Eide
Språkbanken Text, University of Gothenburg, Sweden

Ladda ner artikel

Ingår i: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:18, s. 177-187

Visa mer +

Publicerad: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

The field of sentiment analysis or opinion mining consists in automatically classifying text according to the positive or negative sentiment expressed in it, and has become very popular in the last decade. However, most data and software resources are built for English and a few other languages. In this paper we compare and test different corpus-based and lexicon-based methods for creating a sentiment lexicon. We then manually curate the results of the best performing method. The result, SenSALDO, is a comprehensive sentiment lexicon for Swedish containing 7,618 word senses as well as a full-form version of this lexicon containing 65,953 items (text word forms). SenSALDO is freely available as a research tool in the SWE-CLARIN toolbox under an open-source CC-BY license.

Nyckelord

Sentiment analysis, Swedish, Lexicon

Referenser

Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of LREC 2010, pages 2200–2204.

R. Alexander Bentley, Alberto Acerbi, Paul Ormerod, and Vasileios Lampos. 2014. Books average previous decade of economic misery. PLoS ONE, 9(1):e83147.

Lars Borin and Markus Forsberg. 2017. A diachronic computational lexical resource for 800 years of Swedish. In Caroline Sporleder, Antal van den Bosch, and Kalliopi Zervanou, editors, Language technology for cultural heritage, pages 41–61. Springer, Berlin.

Lars Borin, Markus Forsberg, and Lennart Lönngren. 2013. SALDO: A touch of yin to WordNet’s yang. Language Resources and Evaluation, 47(4):1191–1211.

Yanqing Chen and Steven Skiena. 2014. Building sentiment lexicons for all major languages. In Proceedings of ACL 2014, pages 383–389. ACL.

Ann Devitt and Khursid Ahmad. 2013. Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Language Resources and Evaluation, 47(4):475–511.

Johannes C. Eichstaedt, Hansen Andrew Schwartz, Margaret L. Kern, Gregory Park, Darwin R. Labarthe, Raina M. Merchant, Sneha Jha, Megha Agrawal, Lukasz A. Dziurzynski, Maarten Sap, Emily E.Weeg, Christopherand Larson, Lyle H. Ungar, and Martin E. P. Seligman. 2015. Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2):159–169.

Stian Rødven Eide, Nina Tahmasebi, and Lars Borin. 2016. The Swedish culturomics gigaword corpus: A one billion word Swedish reference dataset for NLP. In Proceedings of the From Digitization to Knowledge workshop at DH 2016, Kraków, pages 8–12, Linköping. LiUEP.

Andrea Esuli and Fabrizio Sebastiani. 2007. Random-walk models of term semantics: An application to opinionrelated properties. Proceedings of LTC 2007, pages 221–225.

Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. 2004. Comparing and aggregating rankings with ties. In Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’04, pages 47–58, New York, NY, USA. ACM.

Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Mass. William L. Hamilton, Kevin Clark, Jure Leskovec, and Dan Jurafsky. 2016. Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of EMNLP 2016, pages 595–605, Austin. ACL.

Richard Johansson and Luis Nieto Piña. 2015. Embedding a semantic network in a word space. In Proceedings of NAACL-HLT 2015, pages 1428–1433, Denver. ACL.

Viggo Kann and Magnus Rosell. 2005. Free construction of a free Swedish dictionary of synonyms. In Proceedings of NODALIDA 2010, Joensuu. University of Eastern Finland.

Maurice G Kendall. 1945. The treatment of ties in ranking problems. Biometrika, pages 239–251.

Svetlana Kiritchenko and Saif M. Mohammad. 2016. Capturing reliable fine-grained sentiment associations by crowdsourcing and best–worst scaling. In Proceedings of NAACL 2016, pages 811–817, San Diego. ACL.

Stephen Kokoska and Daniel Zwillinger. 2000. Standard Probability and Statistics Tables and Formulae. Chapman & Hall / CRC.

Saif Mohammad and Peter Turney. 2010. Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles. ACL.

Jerome L Myers and A. (Arnold) Well. 2003. Research Design and Statistical Analysis . Mahwah, N.J. : Lawrence Erlbaum Associates, 2nd ed edition.

Bianka Nusko, Nina Tahmasebi, and Olof Mogren. 2016. Building a sentiment lexicon for Swedish. In Proceedings of the From Digitization to Knowledge workshop at DH 2016, Kraków, pages 32–37, Linköping. LiUEP.

Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2):1–135.

Magnus Rosell and Viggo Kann. 2010. Constructing a Swedish general purpose polarity lexicon: Random walks in the People’s dictionary of synonyms. In Proceedings of SLTC 2010, pages 19–20, Stockholm. KTH.

Sascha Rothe, Sebastian Ebert, and Hinrich Schütze. 2016. Ultradense word embeddings by orthogonal transformation. arXiv preprint arXiv:1602.07572.

Jacobo Rouces, Lars Borin, Nina Tahmasebi, and Stian Rødven Eide. 2018a. Defining a gold standard for a Swedish sentiment lexicon: Towards higher-yield text mining in the digital humanities. In Proceedings of DHN 2018, pages 219–227, Aachen. CEUR-WS.org.

Jacobo Rouces, Nina Tahmasebi, Lars Borin, and Stian Rødven Eide. 2018b. Generating a gold standard for a Swedish sentiment lexicon. In LREC 2018, pages 2689–2694, Miyazaki. ELRA.

Jacobo Rouces, Lars Borin, and Nina Tahmasebi. forthcoming. Tracking attitudes towards immigration in Swedish media. In Proceedings of DHN 2019, Aachen. CEUR-WS.org.

Rachele Sprugnoli, Sara Tonelli, Alessandro Marchetti, and Giovanni Moretti. 2016. Towards sentiment analysis for historical texts. Digital Scholarship in the Humanities, 31(4):762–772.

Mike Thelwall. 2017. Sentiment analysis. In Luke Sloan and Anabel Quan-Haase, editors, The SAGE Handbook of Social Media Research Methods, pages 545–556. SAGE, London.

Jon Viklund and Lars Borin. 2016. How can big data help us study rhetorical history? In Selected papers from the CLARIN annual conference 2015, pages 79–93, Linköping. LiUEP.

Ting-Fan Wu, Chih-Jen Lin, and Ruby C Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5(Aug):975–1005.

Anis Yazidi, Hugo Lewi Hammer, Aleksander Bai, and Paal Engelstad. 2015. On enhancing the label propagation algorithm for sentiment analysis using active learning with an artificial oracle. In Leszek Rutkowski, Marcin Korytkowski, Rafal Scherer, Ryszard Tadeusiewicz, Lotfi A. Zadeh, and Jacek M. Zurada, editors, Artificial Intelligence and Soft Computing, pages 799–810, Cham. Springer International Publishing.

Citeringar i Crossref