Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning

Elizaveta Kuzmenko
National Research University Higher School of Economics, Russia

Andrey Kutuzov
National Research University Higher School of Economics, Russia

Ingår i: Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping Electronic Conference Proceedings 107:7, s. 87–97

NEALT Proceedings Series 22:7, s. 87–97

Publicerad: 2014-11-11

ISBN: 978-91-7519-175-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features offered to users. We describe our perspective on the corpus, data sources and tools used in compiling it. Elaborate self-made classification of learners’ errors types is thoroughly described. The paper also presents a pilot experiment on creating test sets for particular learners’ problems using corpus data.


Learner corpora; English as a second language; computer-assisted language learning


