Conference article

Inferring the location of authors from words in their texts

Max Berggren
Gavagai & Royal Institute of Technology, KTH, Stockholm, Sweden

Jussi Karlgren
Gavagai & Royal Institute of Technology, KTH, Stockholm, Sweden

Robert Östling
Department of Linguistics, Stockholm University, Sweden

Mikael Parkval
Department of Linguistics, Stockholm University, Sweden

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:26, s. 211-218

NEALT Proceedings Series 23:26, s. 211-218

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)


For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no explicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location indicating words which then can be used to locate. From previous research efforts, a Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid gives the most useful results. The results are applied to data in the Swedish language.


