Annotating Errors in Student Texts: First Experiences and Experiments

Sara Stymne
Linguistics and Philology, Uppsala University, Sweden

Eva Pettersson
Linguistics and Philology, Uppsala University, Sweden

Beáta Megyesi
Linguistics and Philology, Uppsala University, Sweden

Anne Palmér
Scandinavian Languages, Uppsala University, Sweden

Published in: Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition at NoDaLiDa, Gothenburg, 22nd May 2017

Linköping Electronic Conference Proceedings 134:6, p. 47-60

NEALT Proceedings Series 30:6, p. 47-60

Published: 2017-05-11

ISBN: 978-91-7685-502-7

ISSN: 1650-3686 (print), 1650-3740 (online)


We describe the creation of an annotation layer for word-based writing errors for a corpus of student writings. The texts are written in Swedish by students between 9 and 19 years old. Our main purpose is to identify errors regarding spelling, split compounds and merged words. In addition, we also identify simple word-based grammatical errors, including morphological errors and extra words. In this paper we describe the corpus and the annotation process, including detailed descriptions of the error types and guidelines. We find that we can perform this annotation with a substantial inter-annotator agreement, but that there are still some remaining issues with the annotation. We also report results on two pilot experiments regarding spelling correction and the consistency of downstream NLP tools, to exemplify the usefulness of the annotated corpus.


