Conference article

The Curation Module and Statistical Analysis on VLO Metadata Quality

Davor Ostojic
ACDH-OEAW, Vienna, Austria

Go Sugimoto
ACDH-OEAW, Vienna, Austria

Matej Ďurčo
ACDH-OEAW, Vienna, Austria

Download article

Published in: Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 October 2016, CLARIN Common Language Resources and Technology Infrastructure

Linköping Electronic Conference Proceedings 136:7, p. 90-101

Show more +

Published: 2017-05-23

ISBN: 978-91-7685-499-0

ISSN: 1650-3686 (print), 1650-3740 (online)


The Curation Module is developed to facilitate the metadata ingestion and curation process of the Virtual Language Observatory (VLO) by providing a systematic method to measure metadata quality and a user-friendly interface to inspect profiles, records, and collections of the Component MetaData Infrastructure (CMDI) used for the VLO. A large amount of useful statistics generate a comprehensive data matrix including information about the quality score, publication status, facet coverage, and metadata header, as well as the number of records and concepts. The module helps various stakeholders to automatically and systematically identify the metadata problems. Whilst metadata modellers can evaluate the quality of shared profiles, data creators assess the validity of newly created records. Data providers can use it for the improvement of their metadata for better discoverability and accessibility of valuable linguistic contents, whereas working groups could examine the actual use of profiles and records to define the next version of CMDI and VLO. Thus, the Curation Module supports all stages of metadata management and fosters the analysis and improvement of metadata quality to enhance the CLARIN services. In this article, we present a selection of statistical information on the metadata quality made possible by the Curation Module.


Metadata curation, Quality control, Metadata analysis and assessment, Curation module, VLO (Virtual Language Observatory), CMDI (Component Metadata Infrastructure)


[Durco 2013] M. Durco. 2013. SMC4LRT - Semantic Mapping Component for Language Resources and Technology. (masters)Technical University, Vienna, Austria.

[Durco and Mörth 2014] M. Durco, and K. Mörth. 2014. Towards a DH Knowledge Hub - Step 1: Vocabularies. In CLARIN Annual Conference Soesterberg, Netherlands.

[Kemps-Snijders 2014] Kemps-Snijders, M. 2014. Metadata quality assurance for CLARIN.

[King, Ostojic, Durco, and Sugimoto 2016] M. King, D. Ostojic, M. Durco, and G. Sugimoto. 2016. Variability of the Facet Values in the VLO–a Case for Metadata Curation. In Selected Papers from the CLARIN Annual Conference 2015, October 14–16, 2015, Wroclaw, Poland (pp. 25–44) Linköping University Electronic Press.

[Odijk 2014] J. Odijk. 2014. Discovering Resources in CLARIN: Problems and Suggestions for Solutions.

[Trippel, Broeder, Durco, and Ohren 2014] T. Trippel, D. Broeder, M. Durco, and O. Ohren. 2014. Towards automatic quality assessment of component metadata. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (pp. 3851–3856) Reykjavik, Iceland: European Language Resources Association (ELRA).

Citations in Crossref