Conference article

CMDI 1.2: Improvements in the CLARIN Component Metadata Infrastructure

Twan Goosen
The Language Archive, CLARIN ERIC

Menzo Windhouwer
The Language Archive, Meertens Institute, The Netherlands

Oddrun Ohren
National Library of Norway, Norway

Axel Herold
Berlin-Brandenburg Academy of Sciences and Humanities, Germany

Thomas Eckart
Leipzig University, Germany

Matej Ďurčo
Institute for Corpus Linguistics and Text Technology, Austria

Oliver Schonefeld
Institute for the German Language, Germany

Download article

Published in: Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands

Linköping Electronic Conference Proceedings 116:4, p. 36-53

Show more +

Published: 2015-08-26

ISBN: 978-91-7685-954-4

ISSN: 1650-3686 (print), 1650-3740 (online)


This article reports about the on-going work on a new version of the metadata framework Component Metadata Infrastructure (CMDI), central to the CLARIN infrastructure. Version 1.2 introduces a number of important changes based on the experience gathered in the last five years of intensive use of CMDI by the digital humanities community, addressing problems encountered, but also introducing new functionality. Next to the consolidation of the structure of the model and schema sanity, new means for lifecycle management have been introduced aimed at combatting the observed proliferation of components, new mechanism for use of external vocabularies will contribute to more consistent use of controlled values and cues for tools will allow improved presentation of the metadata records to the human users. The feature set has been frozen and approved, and the infrastructure is now entering a transition phase, in which all the tools and data need to be migrated to the new version.




Broeder, D. Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P., and Zinn, C (2010, May). A Data Category Registry- and Component-based Metadata Framework. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC), pages 43–47, Valletta, Malta

Broeder, D., Windhouwer, M., van Uytvanck, D., Goosen, T., and Trippel, T. (2012). CMDI: a Component Metadata Infrastructure. In Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR Workshop Programme.

Brugman, H., and Lindeman, M. (2012). Publishing and Exploiting Vocabularies using the OpenSKOS Repository Service. In Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR Workshop Programme.

Durco, M. & Windhouwer, M. (2014). From CLARIN Component Metadata to Linked Open Data. In Proceedings of the third Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, LREC 2014 Workshop.

Gartner, R. (2003). MODS: Metadata Object Description Schema. JISC Techwatch report TSW, 03-06.

Gavrilidou, M.; Labropoulou, P.; Desipri, E.; Giannopoulou, I.; Hamon, O. & Arranz, V. (2012). The METASHARE Metadata Schema: Principles, Features, Implementation and Conversion from other Schemas. In Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR Workshop Programme. LREC 2012, Istanbul.

Henrich, A., & Gradl, T. (2013). DARIAH (-DE): Digital Research Infrastructure for the Arts and Humanities-Concepts and Perspectives. International Journal of Humanities and Arts Computing, 7(supplement), 47-58.

Lagoze, C., Van de Sompel, H., Nelson, M., and Warner, S. (2002). The Open Archives Initiative Protocol for Metadata Harvesting. Accessed on 20 June 2014.

TEI Consortium, eds. (2014). Guidelines for Electronic Text Encoding and Interchange. 20 January 2014. Accessed on 20 June 2014.

Thomson, H.S., Beech, D., Maloney, M., and Mendelsohn, N. (2004). XML Schema Part 1: Structures Second Edition. Accessed on 20 June 2014.

Trippel T., Broeder, D., Durco, M. and Ohren, O. (2014) Towards automatic quality assessment of component metadata. In Proceedings of the Ninth Conference on International Language Resources and Evaluation (LREC). Reykjavik, Iceland, 26-31 May, 2014. Pages 3851-3856.

Windhouwer, M., Goosen, T., Schonefeld O, Ohren, O., Eckart, T., Herold, A., Misutka, J., Frankhauser P., Schiel, F., Eckart, K., et al. (2014). CMDI 1.2 changes - executive summary. Technical Report CE 2014-0318, CLARIN ERIC,

Citations in Crossref