Conference article

Baltic and Nordic Parts of the European Linguistic Infrastructure

Inguna Skadina
Tilde, Riga, Latvia

Andrejs Vasiljevs
Tilde, Riga, Latvia

Lars Borin
University or Gothenburg, Gothenburg, Sweden

Krister Lindén
University of Helsinki, Helsinki, Finland

Gyri Losnegaard
University of Bergen, Bergen, Norway

Sussi Olsen
University of Copenhagen, CST, Copenhagen, Denmark

Bolette S. Pedersen
University of Copenhagen, CST, Copenhagen, Denmark

Roberts Rozis
Tilde, Riga, Latvia

Koenraad De Smedt
University of Bergen, Bergen, Norway

Download article

Published in: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:19, p. 195-211

NEALT Proceedings Series 16:19, p. 195-211

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper describes scientific; technical; and legal work done on the creation of the linguistic infrastructure for the Nordic and Baltic countries. The paper describes the research on assessment of language technology support for the languages of the Baltic and Nordic countries; work on establishing a language resource sharing infrastructure; and collection and description of linguistic resources. We present improvements necessary to ensure usability and interoperability of language resources; discuss issues related to intellectual property rights for complex resources; and describe extension of infrastructure through integration of language-resource specific repositories. Work on treebanks; wordnets; terminology resources; and finite-state technology is described in more detail. Finally; our approach on ensuring the sustainability of infrastructure is discussed.

Keywords

language resources and tools; linguistic infrastructure; under-resourced languages; multilinguality; treebanks; wordnets; terminology banks

References

Beesley; K. and Karttunen; K. (2003). Finite State Morphology; CSLI publications.

Bird; S. and Simons; G. (2001). The OLAC metadata set and controlled vocabularies. In Proceedings of the ACL Workshop on Sharing Tools and Resources for Research and Education; pages 7–18.

Bird; S. and Simons; G. (2003). Seven dimensions of portability for language documentation and description. Language 79(3); pages 557–582.

Borin; L.; Forsberg; M.; Roxendal; J. (2012). Korp – the corpus infrastructure of Språkbanken. In Proceedings of LREC 2012; pages 474–478.

Borin; L.; Forsberg; M.; Olsson. L.; Uppström; J. (2012). The open lexical infrastructure of Språkbanken. Proceedings of LREC 2012; pages 3598-3602.

Braasch; A. and Olsen; S. (2004). STO: A Danish Lexicon Resource - Ready for Applications. In Fourth International Conference on Language Resources and Evaluation; Proceedings; Vol. IV. Lisbon; pages 1079-1082.

Broeder; D.; Kemps-Snijders; M.; Van Uytvanck; D.; Windhouwer; M.; Withers; P.; Wittenburg; P.; Zinn; C. (2010). A data category registry- and component-based metadata framework. In Proceedings of LREC 2010; pages 43–47.

Desipri; E.; Gavrilidou; M.; Labropoulou; P.; Piperidis; S.; Frontini; F.; Monachini; M.; Arranz; V.; Mapelli; V.; Francopoulo; G.; Declerck; T. (2012). Documentation and user manual of the META-SHARE metadata model. http://www.meta-net.eu/meta-share/METASHARE% 20%20documentationUserManual.pdf.

Francopoulo; G.; George; M.; Calzolari; N.; Monachini; M.; Bel; N.; Pet; M.; Soria; C. (2006). Lexical Markup Framework (LMF). Proceedings of LREC 2006; pages 233–236.

Gavrilidou; M.; Labropoulou; P.; Piperidis; S.; Speranza; M.; Monachini; M.; Arranz; V.; Francopoulo; G. (2011). Specification of metadata-based descriptions for language resources and technologies. T4ME deliverable D7.2.1. http://www.meta-net.eu/public_documents/t4me/ META-NET-D7.2.1-Final.pdf . Gavrilidou; M.; Labropoulou; P.; Desipri; E.; Piperidis; S.; Papageorgiou; H.; Monachini; M.;

Frontini; F.; Declerck; T.; Francopoulo; G.; Arranz; V.; Mapelli; V. (2012). The META-SHARE metadata schema for the description of language resources. Proceedings of LREC 2012; pages 1090–1097.

Helgadóttir; S.; Rögnvaldsson; E. (forthcoming). Language Resources for Icelandic; Workshop on Nordic Language Research Infrastructure; NODALIDA 2013; Oslo.

Lindén; K.; Silfverberg; M.; Pirinen; T. (2009). HFST tool for morphology: An efficient open - source package for construction of morphological analyzers. In State of the Art in Computational Morphology; Mahlow; C. and Piotrowski; M. (eds.). Berlin; Heidelberg: Springer Berlin Heidelberg; pages 28-47.

Lindén; K.; Silfverberg; M.; Axelson; E.; Hardwick; S.; Pirinen; T. (2011). HFST—Framework for Compiling and Applying Morphologies. In Systems and Frameworks for Computational Morphology. Mahlow; C. & Piotrowski; M. (eds.). Springer; Vol. 100; pages 67-85.

Lindén; K.; Axelson; E.; Drobac; S.; Hardwick; S.; Silfverberg; M.; Pirinen; T. A. (2012). Using HFST for Creating Computational Linguistic Applications. In Computational Linguistics Applications; Piasecki; M.; and Przepiórkowski; A.; Springer-Verlag.

Melby; A.K. (2012). Terminology in the age of multilingual corpora. Journal of Specialized Translation 18; pages 7–29.

Oksanen; V. and Lindén; K. (2012). Building shared language research environments inside the European Union: How to optimize the system based on experiences from real life. In First Thematic Conference on the Knowledge Commons. Louvain-la-Neuve; Belgium.

Pedersen; B. S.; Borin; L.; Forsberg; M.; Kahusk; N.; Lindén; K.; Niemi; J.; Nisbeth; N.; Nygaard; L.; Orav; H.; Rögnvaldsson; E.; Seaton; M.; Vider; K.; Kaarlo; V. (2013) Nordic and Baltic wordnets aligned and compared through “WordTies”. In Proceedings of Nodalida 2013 (in press).

Piperidis; S. (2012). The META-SHARE Language Resources Sharing Infrastructure: Principles; Challenges; Solutions. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12); Istanbul; Turkey; pages 36-42.

Skadina; I.; Vasiljevs; A.; Borin; L.; de Smedt; K.; Linden; K.; Rognvaldsson; E. (2011). METANORD: Towards Sharing of Language Resources in Nordic and Baltic Countries. In Proceedings of Workshop on Language Resources; Technology and Services in the Sharing Paradigm (LRTS); Chiang Mai; Thailand; pages 107-114.

Váradi; T.; Krauwer; S.; Wittenburg P.; Wynne; M.; Koskenniemi; K. (2008). CLARIN: common language resources and technology infrastructure. In Proceedings of the Sixth International Language Resources and Evaluation Conference.

Vasiljevs; A.; Pedersen; B.S.; de Smedt; K.; Borin; L.; Skadina; I. (2011). META-NORD: Baltic and Nordic Branch of the European Open Linguistic Infrastructure. In NODALIDA 2011 workshop Visibility and Availability of LT Resources; NEALT Proceedings Series; Vol.13; pages 18-22.

Vasiljevs; A. and Schmitz; K.D. (2006). Collection; harmonization and dissemination of dispersed multilingual terminology resources in an online terminology databank. In Proceedings of TSTT 2006; Third International Conference on Terminology; Standardization and Technology Transfer; pages 265-272.

Vossen; P. (ed.) (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers.

Windhouwer; M.A. and Wright; S.E. (2012). Linking to linguistic data categories in ISOcat. In Chiarcos; C.; Nordhoff; S.; Hellmann; S. (eds); Linked Data in Linguistics; pages 99–107. Berlin: Springer.

Citations in Crossref