Semi-automated typical error annotation for learner English essays: Integrating frameworks

Adrey Kutuzov
National Research University, Higher School of Economics, Russia

Elizaveta Kuzmenko
National Research University, Higher School of Economics, Russia

Ladda ner artikel

Ingår i: Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping Electronic Conference Proceedings 114:5, s. 35-41

NEALT Proceedings Series 26:5, s. 35-41

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-036-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper proposes integration of three open source utilities: brat web annotation tool, Freeling suite of linguistic analyzers and Aspell spellchecker. We demonstrate how their combination can be used to pre-annotate texts in a learner corpus of English essays with potential errors and ease human annotators’ work. Spellchecker alerts and morphological analyzer tagging probabilities are used to detect students’ possible errors of most typical sorts. F-measure for the developed pre-annotation framework with regard to human annotation is 0.57, which already makes the system a substantial help to human annotators, but at the same time leaves room for further improvement.


learner corpora; error annotation; pre-annotation


Stephen Pit Corder. 1981. Error analysis and interlanguage, volume 112. Oxford Univ Press.

Fred J. Damerau. 1964. A technique for computer detection and correction of spelling errors. Commun. ACM, 7(3):171–176, March.

Jirka Hana, Alexandr Rosen, Svatava Škodová, and Barbora Štindlová. 2010. Error-tagged learner corpus of czech. In Proceedings of the Fourth Linguistic Annotation Workshop, pages 11–19. Association for Computational Linguistics.

Jirka Hana, Alexandr Rosen, Barbora Štindlov&aactue;, and Jan Štěpánek. 2014. Building a learner corpus. Language Resources and Evaluation, 48(4):741– 752.

Tomáš Jelínek, Barbora Štindlová, Alexandr Rosen, and Jirka Hana. 2012. Combining manual and automatic annotation of a learner corpus. In Text, Speech and Dialogue, pages 127–134. Springer.

Elizaveta Kuzmenko and Andrey Kutuzov. 2014. Russian error-annotated learner english corpus: a tool for computer-assisted language learning. NEALT Proceedings Series Vol. 22, page 87.

Claudia Leacock, Martin Chodorow, Michael Gamon, and Joel Tetreault. 2010. Automated grammatical error detection for language learners. Synthesis lectures on human language technologies, 3(1):1–134.

Tim Moore and Janne Morton. 2005. Dimensions of difference: a comparison of university writing and ielts writing. Journal of English for Academic Purposes, 4(1):43–66.

Nadja Nesselhauf. 2004. Learner corpora and their potential for language teaching. How to use corpora in language teaching, 12:125–156.

Llus Padro and Evgeny Stanilovsky. 2012. Freeling 3.0: Towards wider multilinguality. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uur Doan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, may. European Language Resources Association (ELRA).

Marwa Ragheb and Markus Dickinson. 2012. Defining syntax for learner language annotation. In Proceedings of COLING 2012: Posters, pages 965–974, Mumbai, India, December.

Marc Reznicek, Anke L¨udeling, and Hagen Hirschmann. 2013. Competing target hypotheses in the falko corpus. Automatic Treatment and Analysis of Learner Corpus Data, 59.

Alexandr Rosen, Jirka Hana, Barbora ětindlová, and Anna Feldman. 2014. Evaluating and automating
the annotation of a learner corpus. Language Resources and Evaluation, 48(1):65–92.

Pontus Stenetorp, Sampo Pyysalo, Goran Topic, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat: a web-based tool for nlp-assisted text annotation. In EACL, pages 102–107.

Citeringar i Crossref