Conference article

The Impact of Copyright and Personal Data Laws on the Creation and Use of Models for Language Technologies

Aleksei Kelli
University of Tartu, Estonia

Arvi Tavast
Institute of the Estonian Language, Estonia

Krister Lindén
University of Helsinki, Finland

Kadri Vider
University of Tartu, Estonia

Ramunas Birštonas
Vilnius University, Lithuania

Penny Labropoulou
ILSP/ARC, Greece

Irene Kull
University of Tartu, Estonia

Gaabriel Tavits
University of Tartu, Estonia

Age Värv
University of Tartu, Estonia

Pavel Stranák
Charles University, Czechia

Jan Hajic
Charles University, Czechia

Download article

Published in: Selected Papers from the CLARIN Annual Conference 2019

Linköping Electronic Conference Proceedings 172:8, p. 53-65

Show more +

Published: 2020-07-03

ISBN: 978-91-7929-807-4

ISSN: 1650-3686 (print), 1650-3740 (online)


The authors address the legal issues relating to the creation and use of language models. The article begins with an explanation of the development of language technologies. The authors analyse the technological process within the framework copyright, related rights and personal data protection law. The authors also cover commercial use of language models. The authors’ main argument is that legal restrictions applicable to language data containing copyrighted material and personal data usually do not apply to language models. Language models are generally not considered derivative works. Due to a wide range of language models, this position is not absolute.


copyright, database right, personal data protection, language models, derivative work, language technology


No references available

Citations in Crossref