Manuela Sanguinetti
Università di Torino, Dipartimento di Informatica, Torino, Italy
Cristina Bosco
Università di Torino, Dipartimento di Informatica, Torino, Italy
Alessandro Mazzei
Università di Torino, Dipartimento di Informatica, Torino, Italy
Alberto Lavelli
Fondazione Bruno Kessler, Trento, Italy
Fabio Tamburini
Università di Bologna, FICLIT, Bologna, Italy
Download articlePublished in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy
Linköping Electronic Conference Proceedings 139:26, p. 229-239
Published: 2017-09-13
ISBN: 978-91-7685-467-9
ISSN: 1650-3686 (print), 1650-3740 (online)
Social media texts have been widely used in recent years for various tasks related to sentiment analysis and opinion mining; nevertheless, they still feature a wide range of linguistic phenomena that have proved to be particularly challenging for automatic processing, especially for syntactic parsing. In this paper, we describe a recently started project for the development of PoSTWITA-UD, a novel Italian Twitter treebank in Universal Dependencies. In particular, the paper focuses on its development steps, and on the challenges such work entails, both for automatic systems and human annotators, by discussing the errors produced, by parsers in particular, and the guidelines we adopted for manual revision of annotated tweets. Such guidelines aim to bring to the reader’s attention the most critical cases (in themselves, but also in a UD perspective) encountered so far and stemming from the specific characteristics of the texts we are dealing with.