Conference article

CLARIN-DK - status and challenges

Lene Offersgaard
University of Copenhagen, Denmark

Bart Jongejan
University of Copenhagen, Denmark

Mitchell Seaton
University of Copenhagen, Denmark

Dorte Haltrup Hansen
University of Copenhagen, Denmark

Download article

Published in: Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20

Linköping Electronic Conference Proceedings 89:3, p. 21-32

NEALT Proceedings Series 20:3, p. 21-32

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-585-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

The initiative CLARIN-DK (starting as a Danish preparatory DK-CLARIN project) is a part of the Danish research infrastructure initiative; DIGHUMLAB. In this paper the aims; status; and the current challenges for CLARIN-DK are presented. CLARIN-DK focuses on written and spoken language resources; multimodal resources and tools; and involving users is a core issue. Users involved in a preparatory project gave input that led to the current user interface of the resource repository website; clarin.dk. Clarin.dk is now in the transition phase from a repository to a research infrastructure; where researchers and students can be supported in their research; education and studies. Clarin.dk works with a Service-Oriented Architecture (SOA); uses eSciDoc and Fedora Commons; and is primarily based on open source solutions. A key issue in CLARIN-DK is using standards such as TEIP5; IMDI; OLAC; and CMDI for resource metadata. Optional metadata fields suggested by users have been included when it could comply with the standards; allowing for the diversity needed when describing the research material. Current work includes normalising metadata naming in the search pages; and making search more user-friendly by adding selectable pick-lists for query values. Also a consolidation of metadata quality is currently performed by changing some metadata values to a more harmonized set of values. All deposited metadata are maintained. Clarin.dk will apply for assessment as a CLARIN ERIC B centre in 2013 enforcing the sustainability and persistency of the infrastructure. Clarin.dk has already joined the national identity federation WAYF; implemented SSL-certificates; and offers harvesting of metadata via OAI-PMH as part of the CLARIN centre requirements.

Keywords

Infrastructure; Language Resources; Repository; metadata; CLARIN

References

Asmussen; J. (2011) Text metadata: What the header of a text item looks like; DKCLARIN WP2.1 Technical Report; http://korpus.dsl.dk/clarin/corpus-doc/textheader. pdf

Asmussen; J. (2011) Text formatting: Bringing corpus texts into good shape and enabling flexible annotation of them. DK-CLARIN WP2.1 Technical Report.

Asmussen; J. & Halskov; J. (2009) Compiling and annotating corpora in DK-CLARIN. Interpreting and tweaking TEI P5. In Proceedings of the Corpus Linguistics Conference CL2009. University of Liverpool; UK 2009. http://ucrel.lancs.ac.uk/publications/cl2009/

Conrad; A. (2010). The use of eSciDoc in Clarin.dk. eSciDoc Days Copenhagen; 2010. https://www.escidoc.org/pdf/day1-conrad-clarindk.pdf

Broeder; D. (2012) CMDI: a Component Metadata Infrastructure. CMDI (Component Metadata Infrastructure) workshop; September 13; 2012 MPI for Psycholinguistics; http://www.clarin.eu/sites/default/files/cmdi-daan.pdf

Fersøe; H & Maegaard; B. (2009). CLARIN in Denmark – European and Nordic Perspectives. In: Nordic Perspectives on the CLARIN Infrastructure on Common Language Resources; NEALT Proceedings Series; Vol. 5; pp. 6-11. Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9944.

Halskov; J.; Hansen; D. H.; Braasch; A.; & Olsen; S. (2010). Quality indicators of LSP texts – selection and measurements: Measuring the terminological usefulness of documents for an LSP corpus. In Proceedings of the Seventh International Conference on Language Resources and Evaluation: LREC 2010 (s. 2614-2620). Valletta; Malta: European language resources distribution agency.

Hinrichs; E. W. (2009). CLARIN Short Guide Standards for Text Encoding. http://www.clarin.eu/files/standards-text-CLARIN-ShortGuide.pdf

Jongejan; B. Workflow Management in CLARIN-DK. In Proceedings of the Nordic Language Research Infrastructure Workshop at NoDaLiDa; Oslo; May 22; 2013

Offersgaard; L. Jongejan; B. and Maegaard; B. (2011). How Danish users tried to answer the unaskable during implementation of clarin.dk. In SDH 2011 – Supporting Digital Humanities; Copenhagen.

Citations in Crossref