Learner Corpus Anonymization in the Age of GDPR: Insights from the Creation of a Learner Corpus of Swedish

Beéta Megyesi
Uppsala University, Sweden

Lena Granstedt
Umeå University, Sweden

Sofia Johansson
Stockholm University, Sweden

Julia Prentice
University of Gothenburg, Sweden

Dan Rosén
University of Gothenburg, Sweden

Carl-Johan Schenström
University of Gothenburg, Sweden

Gunlög Sundberg
Stockholm University, Sweden

Mats Wirén
Stockholm University, Sweden

Elena Volodina
University of Gothenburg, Sweden

Ingår i: Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018

Linköping Electronic Conference Proceedings 152:6, s. 47-56

NEALT Proceedings Series 36:6, s. 47-56

Publicerad: 2018-11-02

ISBN: 978-91-7685-173-9

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper reports on the status of learner corpus anonymization for the ongoing research infrastructure project SweLL. The main project aim is to deliver and make available for research a well-annotated corpus of essays written by second language (L2) learners of Swedish. As the practice shows, annotation of learner texts is a sensitive process demanding a lot of compromises between ethical and legal demands on the one hand, and research and technical demands, on the other. Below, is a concise description of the current status of pseudonymization of language learner data to ensure anonymity of the learners, with numerous examples of the above-mentioned compromises.


learner corpus, anonymization, pseudonymization, legal issues, GDPR



