Building a learner corpus for Russian

Ekaterina Rakhilina
National Research University Higher School of Economics, Moscow, Russia

Anastasia Vyrenkova
National Research University Higher School of Economics, Moscow, Russia

Elmira Mustakimova
National Research University Higher School of Economics, Moscow, Russia

Alina Ladygina
Eberhard Karls Universität Tübingen, Tübingen, Germany

Ivan Smirnov
Sholokhov Moscow State, University for the Humanities, Moscow, Russia

Ladda ner artikel

Ingår i: Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Linköping Electronic Conference Proceedings 130:9, s. 66-75

Visa mer +

Publicerad: 2016-11-15

ISBN: 978-91-7685-633-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper we describe an open learner corpus of Russian. The Russian Learner Corpus (RLC) is the first corpus with clear distinction between foreign language learners and heritage speakers. We discuss the structure of the corpus, its development and the annotation principles. This paper describes the platform of the RLC which combines online tools for text uploading, processing, error annotation and corpus search.


Learner corpus, Error annotation, Corpus processing tool, Pedagogical resource


Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.

N. C. Ellis, 2013. Oxford Handbook of Construction Grammar, chapter Second language acquisition, pages 365–378. Oxford University Press, Oxford.

Charles J. Fillmore, Paul Kay, and Mary Catherine O’Connor. 1988. Regularity and idiomaticity in grammatical constructions: the case of let alone. Language, 64(3):501–538.

Adele Goldberg. 1995. Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago.

Adele Goldberg. 2006. Constructions at Work: The Nature of Generalization in Grammar. Oxford University Press, Oxford.

Sylviane Granger. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. Lund Studies in English, 88:37–51.

Sylviane. Granger, 1998. The computer learner corpus: a versatile new source of data for SLA research, pages 191–202. Longman, London.

Ludmila Isurin and Tanya Ivanova-Sullivan. 2008. Lost in between: The case of Russian heritage speakers. Heritage Language Journal, 6(1):72–104.

S. Jarvis and M. Paquot, 2015. Native language identification. Cambridge University Press.

Elena Klyachko, Timofey Arkhangelskiy, Olesya Kisselev, and Ekaterina Rakhilina. 2013. Automatic error detection in Russian learner language. In Proceedings of the First workshop Corpus Analysis with Noise in the Signal (CANS 2013), Lancaster, United Kingdom.

Andrey Kutuzov and Maria Kunilovskaya, 2014. Russian Learner Translator Corpus, pages 315–323. Springer International Publishing, Cham.

C. Leacock, M. Chodorow, M. Gamon, and J. Tetreault. 2014. Automated Grammatical Error Detection for Language Learners: Second Edition. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Detmar Meurers, 2015. Learner Corpora and Natural Language Processing. Cambridge University Press.

Diane Nicholls. 2003. The Cambridge learner corpus - error coding and analysis for lexicography and ELT. In Dawn Archer, Paul Rayson, Andrew Wilson, and Tony McEnery, editors, Proceedings of the Corpus Linguistics 2003 conference. Lancaster University, UK.

I.G. Ovchinnikova and A.V. Pavlova. 2016. Perevodcheskij bilingvizm. Po materialam oshibok pis’mennogo perevoda. FLINTA: Nauka, Moscow.

A. Pavlenko, 2008. Narrative analysis in the study of bi- and multilingualism, pages 311–325. Blackwell, Oxford.

Maria Polinsky, Ekaterina Rakhilina, and Anastasia Vyrenkova. 2016. Linguistic creativity in heritage speakers. Glossa. In print.

Maria Polinsky. 2006. Incomplete acquisition: American Russian. Journal of Slavic Linguistics, pages 191–262.

Maria Polinsky. 2008. Heritage language narratives. Heritage Language Education: A New Field Emerging, pages 149–164.

Maria Polinsky. 2010. Russkij jazyk pervogo i vtorogo pokolenija emigrantov, zhivuschix v ssha. Slavica Helsingiensia, 40:336–352.

Ekaterina Protassova. 2016. Narrative. frog stories in Russian: 41 transcripts – ages 5, 6, 7, 8, 9, 10, and adult.

E.V. Rakhilina. 2015. Stepeni sravneniya v svete russkoj grammatiki oshibok. Trudy instituta yazykoznaniya im. V.V. Vinogradova, 6:310–333.

Olga Ramsajtseva, Aleksandr Ivankov, Robert Zakoyan, and Alina Ladygina. 2016. Morphchecker for nonstandard data: a tool for morphological error correction in learner corpora. In print.

Marc Reznicek, Anke Lüdeling, Cedric Krummes, Franziska Schwantuschke, Maik Walter, Karin Schmidt, Hagen Hirschmann, and Torsten Andreas. 2012. Das Falko-Handbuch. Korpusaufbau und Annotationen Version 2.01.

Marc Reznicek, Anke Lüdeling, and Hagen Hirschmann, 2013. Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture. Studies in Corpus Linguistics. John Benjamins Publishing Company.

M. Rusakova. 2013. Elementy antropotsentrichnoj grammatiki russkogo yazyka. Yazyki slavyanskikh kul’tur, Moscow.

Ilya Segalovich and Vitaly Titov. 1997. Mystem.

Barbora Štindlová, Svatava Škodová, Jirka Hana, and Alexandr Rosen. 2013. A learner corpus of Czech: current state and future directions. In Sylviane Granger, Gaëtanelle Gilquin, and Fanny Meunier, editors, Twenty Years of Learner Corpus Research: Looking back, Moving ahead. Proceedings of, 15-17 September 2011, Corpora and Language in Use, Louvain-la-Neuve. Presses Universitaires de Louvain. In print.

Michael Tomasello. 2003. Constructing a Language: A Usage-Based Theory of Language Acquisition. Harvard University Press, Harvard.

Y. Tono. 2003. Learner corpora: design, development and applications. In Proceedings of the 2003 Corpus Linguistics Conference, pages 800–809.

S.N. Tsejtlin. 1982. Rechevye oshibki i ikh preduprezhdenie: posobie dlya uchitelej. Prosveschenie, Moscow.

G. Vald´es, 2000. The teaching of heritage languages: an introduction for Slavic-teaching professionals, pages 375–403. Slavica, Bloomington.

A.S. Vyrenkova, M.S. Polinsky, and E.V. Rakhilina. 2014. Grammatika oshibok i grammatika konstruktsij: heritage (unasledovannyj) russkij yazyk. Voprosy yazykoznaniya, 3:3–19.

E.A. Zemskaya, editor. 2001. Yazyk russkogo zarubezha: Obschie protsessy i rechevye portrety. Yazyki slavyanskoj kultury, Moscow.

Natalia Zevakhina and Svetlana Dzhakupova. 2015. Corpus of Russian student texts: design and prospects. In Proceedings of the 21st International Conference on omputational Linguistics Dialog, Moscow.

Citeringar i Crossref