Data Conversion and Consistency of Monolingual Corpora: Russian UD Treebanks

Kira Droganova
Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic

Olga Lyashevskaya
National Research University Higher School of Economics, Moscow, Russia / Vinogradov Institute of the Russian Language RAS, Moscow, Russia

Daniel Zeman
Charles University, Faculty of Mathematics and Physics, Prague, Czeck Republic

Ingår i: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway

Linköping Electronic Conference Proceedings 155:7, s. 52-65

Publicerad: 2018-12-10

ISBN: 978-91-7685-137-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper we focus on syntactic annotation consistency within Universal Dependencies (UD) treebanks for Russian: UD_Russian-SynTagRus, UD_Russian-GSD, UD_Russian-Taiga, and UD_Russian-PUD. We describe the four treebanks, their distinctive features and devel- opment. In order to test and improve consistency within the treebanks, we reconsidered the experiments by Martínez Alonso and Zeman; our parsing experiments were conducted using a state-of-the-art parser that took part in the CoNLL 2017 Shared Task. We analyze error classes in functional and content relations and discuss a method to separate the errors induced by annotation inconsistency and those caused by syntactic complexity and other factors.


annotation consistency, Universal Dependencies, Russian treebanks, dependency parsing


