Alexandre Rademaker
IBM Research and EMAp/FGV, Brazil
Fabricio Chalub
IBM Research, Brazil
Livy Real
University of São Paulo, Brazil
Cláudia Freitas
PUC-Rio, Brazil
Eckhard Bick
University of Southern Denmark, Denmark
Valeria de Paiva
Nuance Communications, USA
Download article
Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy
Linköping Electronic Conference Proceedings 139:23, p. 197-206
Published: 2017-09-13
ISBN: 978-91-7685-467-9
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper describes the creation of a Portuguese corpus following the guidelines of the Universal Dependencies Framework. Instead of starting from scratch, we invested in a conversion process from the existing Portuguese corpus, called Bosque. The conversion was done by applying a context-sensitive set of Constraint Grammar rules to its original deep linguistic analysis, which was carried out by the parser PALAVRAS, with some additional manual corrections. Universal Dependencies offer the promise of greater parallelism between languages, a plus for researchers in many areas. We report the challenges of dealing with Portuguese, a Romance language, hoping that our experience will help others.