Universal Dependencies and a Non-Native Czech

Jirka Hana
Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech Republic

Barbora Hladká
Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech Republic

Ingår i: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway

Linköping Electronic Conference Proceedings 155:11, s. 105-114

Publicerad: 2018-12-10

ISBN: 978-91-7685-137-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


CzeSL is a learner corpus of texts produced by non-native speakers of Czech. Such corpora are a great source of information about specific features of learners’ language, helping language teachers and researchers in the area of second language acquisition. In our project, we have focused on syntactic annotation of the non-native text within the framework of Universal Dependencies. As far as we know, this is a first project annotating a richly inflectional non-native language. Our ideal goal has been to annotate according to the non-native grammar in the mind of the author, not according to the standard grammar. However, this brings many challenges. First, we do not have enough data to get reliable insights into the grammar of each author. Second, many phenomena are far more complicated than they are in native languages. We believe that the most important result of this project is not the actual annotation, but the guidelines and principles that can be used as a basis for other non-native languages.


learner corpus, second language, syntax annotation, Universal Dependencies, second language acquisition


