Conference article

Data collection for learner corpus of Latvian: copyright and personal data protection

Inga Kaija
Institute of Mathematics and Computer Science, University of Latvia; Riga Stradinš University, Latvia

Ilze Auzina
Institute of Mathematics and Computer Science, University of Latvia, Riga, Latvia

Download article

Published in: Selected Papers from the CLARIN Annual Conference 2019

Linköping Electronic Conference Proceedings 172:6, p. 41-47

Show more +

Published: 2020-07-03

ISBN: 978-91-7929-807-4

ISSN: 1650-3686 (print), 1650-3740 (online)


Copyright and personal data protection are two of the most important legal aspects of collecting data for a learner corpus. The paper explains the challenges in data collection for the learner corpus of Latvian “LaVA” and describes the procedure undertaken to ensure protection of the texts’ authors’ rights. An agreement / metadata questionnaire form was created to inform the authors of the ways their texts are used and to receive the authors’ permission to use them in the stated way. The information, permission, and the metadata questionnaire are printed on one side of an A4 size paper sheet, and the author is supposed to write the text on the other side by hand, thus eliminating the need to identify the author of the text separately. After scanning and adding to the corpus, the text originals are returned to the authors.


copyright, personal data protection, learner corpus, Latvian


No references available

Citations in Crossref