Improving the Cross-Lingual Projection of Syntactic Dependencies

Jörg Tiedemann
Department of Linguistics and Philology, Uppsala University, Sweden

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:24, s. 191-199

NEALT Proceedings Series 23:24, s. 191-199

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)


This paper presents several modifications of the standard annotation projection algorithm for syntactic structures in cross-lingual dependency parsing. Our approach avoids unnecessary dummy nodes and includes efficient data sub-set selection techniques that have a substantial impact on parser performance in terms of labeled attachment scores. We test our techniques on data from the Universal Dependency Treebank and demonstrate the improvements on a number of language pairs. We also look at treebank translation including syntax-base models and data combination techniques that push the performance even further. We achieve absolute improvements of up to almost six points in labeled attachment scores pushing the state-of-the art in cross-lingual dependency parsing for all language pairs tested in our experiments.


