Henk van den Heuvel
CLST, Radboud University, Nijmegen, The Netherlands
Nelleke Oostdijk
CLST, Radboud University, Nijmegen, The Netherlands
Eric Sanders
CLST, Radboud University, Nijmegen, The Netherlands
Vanja De Lint
CLST, Radboud University, Nijmegen, The Netherlands
Download articlePublished in: Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands
Linköping Electronic Conference Proceedings 116:5, p. 54-62
Published: 2015-08-26
ISBN: 978-91-7685-954-4
ISSN: 1650-3686 (print), 1650-3740 (online)
Data curation comprises activities such as digitizing data (where necessary), converting the data so as to conform to accepted standard formats, (re)shaping metadata and adding documentation. In this contribution we present the motivation for a data curation service (DCS) in the CLARIN-NL project, and the activities the DCS employed during the past years in curating a variety of resources, including dialect dictionaries, speech databases for language acquisition and interview data. In the second part, we present a view on how in the future data curation is best addressed as an integral part of research data management and what could be the role for an expertise centre like the DCS in this context. We envisage and advocate a shift in the future in which data management becomes an integral part of the overall research data management plan (DMP) right from the start of a project. For researchers the university libraries are a natural entry point for data management issues. The data expertise centres can be installed as back offices for consultancy and data curation tasks.
language resources;sustainable infrastructure;data curation;research data management
Calzolari, N.; Quochi, V. and Soria, C. (2014) The Strategic Language Resource Agenda. Retrieved from: http://www.flarenet.eu/sites/default/files/FLaReNet_Strategic_Language_Resource_Agenda.pdf. Retrieval date: 20 March 2014.
Francopoulo, G. (2013). LMF Lexical Markup Framework. Chapter 3. Wiley-ISTE. ISBN: 978-1848214309.
Gavrilidou, M.; Labropoulou, P.; Desipri, E.; Piperidis, S.; Papageorgiou, H.; Monachini, M.; Frontini F.; Declerck, T.; Francopoulo, G.; Arranz, V. and Mapelli, V. (2012). The META-SHARE Meta Schema for the description of language resources. In Proceedings of the International Conference on Language Resources and Evaluation, LREC-2012, Istanbul, Turkey.
Odijk, J. (2010). The CLARIN-NL project. In Proceedings of the International Conference on Language Resources and Evaluation, LREC-2010, pp. 48-53. Valletta, Malta.
Odijk, J. (2014). CLARIN-NL: Major results. In Proceedings of the International Conference on Language Resources and Evaluation, LREC-2014, pp. 2187-2193. Reykjavik, Iceland.
Oostdijk, N. and Van den Heuvel, H. (2012). Introducing the CLARIN-NL Data Curation Service. In Proceedings of the Workshop Challenges in the management of large corpora. LREC2012, Istanbul, 22 May 2012. http://www.lrec-conf.org/proceedings/lrec2012/index.html. Retrieval date: 20 March 2014.
Oostdijk, N.; Van den Heuvel, H. and Treurniet, M. ( 2013). The CLARIN-NL Data Curation Service: Bringing Data to the Foreground. The International Journal of Digital Curation, Vol. 8, Issue 2, 134-145.
Oostdijk, N. and Van den Heuvel, H.( 2014). The Evolving Infrastructure for Language Resources and the Role for Data Scientists. In Proceedings of the International Conference on Language Resources and Evaluation, LREC-2014, Reykjavik.
Sanders, E.; Van de Craats, I. and De Lint, V. (2014). The Dutch LESLLA Corpus. In Proceedings of the International Conference on Language Resources and Evaluation, LREC-2014, Reykjavik.
Van den Heuvel, H.; Sanders, E.; Rutten, R. and Scagliola, S. (2012). An Oral History Annotation Tool for INTERVIEWs. In Proceedings of the International Conference on Language Resources and Evaluation, LREC-2012, Istanbul, Turkey.
Windhouwer, M. and Wright, S.E. (2013). LMF and the Data Category Registration: Principles and application. In: G. Francopoulo (ed.): LMF Lexical Markup Framework. Chapter 3. Wiley-ISTE. ISBN: 978-1848214309.