Conference article

Annotating Italian Social Media Texts in Universal Dependencies

Manuela Sanguinetti
Università di Torino, Dipartimento di Informatica, Torino, Italy

Cristina Bosco
Università di Torino, Dipartimento di Informatica, Torino, Italy

Alessandro Mazzei
Università di Torino, Dipartimento di Informatica, Torino, Italy

Alberto Lavelli
Fondazione Bruno Kessler, Trento, Italy

Fabio Tamburini
Università di Bologna, FICLIT, Bologna, Italy

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:26, p. 229-239

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Social media texts have been widely used in recent years for various tasks related to sentiment analysis and opinion mining; nevertheless, they still feature a wide range of linguistic phenomena that have proved to be particularly challenging for automatic processing, especially for syntactic parsing. In this paper, we describe a recently started project for the development of PoSTWITA-UD, a novel Italian Twitter treebank in Universal Dependencies. In particular, the paper focuses on its development steps, and on the challenges such work entails, both for automatic systems and human annotators, by discussing the errors produced, by parsers in particular, and the guidelines we adopted for manual revision of annotated tweets. Such guidelines aim to bring to the reader’s attention the most critical cases (in themselves, but also in a UD perspective) encountered so far and stemming from the specific characteristics of the texts we are dealing with.

Keywords

No keywords available

References

No references available

Citations in Crossref