Conference article

Universal Dependencies Are Hard to Parse – Or Are They?

Ines Rehbein
Leibniz ScienceCampus, Institut für Deutsche Sprache Mannheim, Germany

Julius Steen
Leibniz ScienceCampus, Universität Heidelberg, Germany

Bich-Ngoc Do
Leibniz ScienceCampus, Universität Heidelberg, Germany

Anette Frank
Leibniz ScienceCampus, Universität Heidelberg, Germany

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:25, p. 218-228

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.

Keywords

No keywords available

References

No references available

Citations in Crossref