Conference article

Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning

Elizaveta Kuzmenko
National Research University Higher School of Economics, Russia

Andrey Kutuzov
National Research University Higher School of Economics, Russia

Download article

Published in: Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping Electronic Conference Proceedings 107:7, p. 87–97

NEALT Proceedings Series 22:7, p. 87–97

Show more +

Published: 2014-11-11

ISBN: 978-91-7519-175-1

ISSN: 1650-3686 (print), 1650-3740 (online)


The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features offered to users. We describe our perspective on the corpus, data sources and tools used in compiling it. Elaborate self-made classification of learners’ errors types is thoroughly described. The paper also presents a pilot experiment on creating test sets for particular learners’ problems using corpus data.


Learner corpora; English as a second language; computer-assisted language learning


Altenberg, B. and Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied linguistics, 22(2):173–195.

Corder, S. P. (1981). Error analysis and interlanguage, volume 112. Oxford Univ Press.

Cotton, D., Falvey, D., Kent, S., Albery, D., Kempton, G., and Hughes, J. (2008). Language Leader: Upper Intermediate. Pearson Education.

Dulay, H., Burt, M., and Krashen, S. D. (1982). Language two, volume 2. Oxford University Press New York.

Gas, S. (1979). Language transfer and universal grammatical relations. Language learning, 29(2):327–344.

Granger, S. (2003). Error-tagged learner corpora and call: A promising synergy. CALICO journal, 20(3):465–480.

Granger, S., Dagneaux, E., Meunier, F., Paquot, M., et al. (2009). The international corpus of learner english. version 2. handbook and cd-rom.

Granger, S. et al. (1996). From ca to cia and back: An integrated approach to computerized bilingual and learner corpora.

Granger, S., Gilquin, G., and Meunier, F. (2013). Twenty Years of Learner Corpus Research. Looking Back, Moving Ahead: Proceedings of the First Learner Corpus Research Conference (LCR 2011), volume 1. Presses universitaires de Louvain.

Granger, S. and Paquot, M. (2014). The louvain eap dictionary (lead): A tailor-made web-based tool for non-native academic writers of english.

Izumi, E. and Isahara, H. (2004). Investigation into language learners’ acquisition order based on the error analysis of the learner corpus. In Proceedings of Pacific-Asia Conference on Language, Information and Computation (PACLIC) 18 Satellite Workshop on E-Learning, Japan.(in printing).

Izumi, E., Uchimoto, K., and Isahara, H. (2005). Error annotation for corpus of japanese learner english. In Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora, pages 71–80.

Kachru, B. B. (1992). The other tongue: English across cultures. University of Illinois Press.

Kutuzov, A. and Kunilovskaya, M. (2014). Russian learner translator corpus. In Sojka, P., Horák, A., Kope?cek, I., and Pala, K., editors, Text, Speech and Dialogue, volume 8655 of Lecture Notes in Computer Science, pages 315–323. Springer International Publishing.

Lüdeling, A., Walter, M., Kroymann, E., and Adolphs, P. (2005). Multi-level error annotation in learner corpora. Proceedings of corpus linguistics 2005, pages 15–17.

McCarter, S. and Roberts, R. (2010). Ready for IELTS Coursebook. Macmillan Education. Moore, T. and Morton, J. (2005). Dimensions of difference: a comparison of university writing and ielts writing. Journal of English for Academic Purposes, 4(1):43–66.

Padró, L. and Stanilovsky, E. (2012). Freeling 3.0: Towards wider multilinguality. In Calzolari, N., Choukri, K., Declerck, T., Do?gan, M. U., Maegaard, B., Mariani, J., Odijk, J., and Piperidis, S., editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).

Seidlhofer, B. (2002). Pedagogy and local learner corpora: Working with learning-driven data. Computer learner corpora, second language acquisition and foreign language teaching, pages 213–34.

Siemen, P., Lüdeling, A., and Müller, F. H. (2006). Falko – ein fehlerannotiertes lernerkorpus des deutschen. Proceedings of Konvens 2006.

Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou, S., and Tsujii, J. (2012). brat: a web-based tool for nlp-assisted text annotation. In EACL, pages 102–107.

Citations in Crossref