Max Berggren
Gavagai & Royal Institute of Technology, KTH, Stockholm, Sweden
Jussi Karlgren
Gavagai & Royal Institute of Technology, KTH, Stockholm, Sweden
Robert Östling
Department of Linguistics, Stockholm University, Sweden
Mikael Parkval
Department of Linguistics, Stockholm University, Sweden
Download articlePublished in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Linköping Electronic Conference Proceedings 109:26, p. 211-218
NEALT Proceedings Series 23:26, p. 211-218
Published: 2015-05-06
ISBN: 978-91-7519-098-3
ISSN: 1650-3686 (print), 1650-3740 (online)
For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no explicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location indicating words which then can be used to locate. From previous research efforts, a Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid gives the most useful results. The results are applied to data in the Swedish language.
Lars Backstrom, Jon Kleinberg, Ravi Kumar, and Jasmine Novak. Spatial variation in search engine queries. In 17th international conference on World Wide Web. ACM, 2008.
Zhiyyan Cheng, James Caverlee, and Kyumin Lee. You are where you tweet: a content-based approach to geolocating Twitter users. In 19th ACM international Conference on Information and Knowledge Management. ACM,
2010.
Jacob Eisenstein, Brendan O’Connor, Noah A Smith, and Eric P Xing. A latent variable model for geographic lexical variation. In Conference on Empirical Methods in Natural Language Processing. ACL, 2010.
Bo Han, Paul Cook, and Timothy Baldwin. Text-based Twitter user geolocation prediction. Journal of Artificial Intelligence Research (JAIR), 49:451–500, 2014.
Liangjie Hong, Amr Ahmed, Siva Gurumurthy, Alexander J Smola, and Kostas Tsioutsiouliklis. Discovering geographical topics in the Twitter stream. In 21st international conference on World Wide Web. ACM, 2012.
Sheila Kinsella, Vanessa Murdock, and Neil O’Hare. I’m eating a sandwich in Glasgow: modeling locations with tweets. In 3rd international workshop on Search and mining user-generated contents. ACM, 2011.
Guoliang Li, Jun Hu, Jianhua Feng, and Kian-lee Tan. Effective location identification from microblogs. In 30th IEEE International Conference on Data Engineering. IEEE, 2014.
Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. Where is this tweet from? Inferring home locations of Twitter users. In 6th International AAAI Conference on Web and Social Media, 2012.
Mikael Parkvall. H¨ar g°ar gr¨ansen. Spr°aktidningen, October 2012. ISSN 1654-5028.
Reid Priedhorsky, Aron Culotta, and Sara Y Del Valle. Inferring the origin locations of tweets with quantitative confidence. In 17th ACM conference on Computer Supported Cooperative Work & Social Computing. ACM, 2014.
Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang. Geographical topic discovery and comparison. In 20th international conference on World Wide Web. ACM, 2011.