A Use Case for Linguistic Research on Dutch with CLARIN

Jan Odijk
Utrecht University, The Netherlands

Ladda ner artikel

Ingår i: Selected Papers from the CLARIN Annual Conference 2015, October 14–16, 2015, Wroclaw, Poland

Linköping Electronic Conference Proceedings 123:4, s. 45-61

NEALT Proceedings Series 28:4, p. 45-61

Visa mer +

Publicerad: 2016-04-11

ISBN: 978-91-7685-765-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper I describe a particular Dutch linguistic problem and I show that it can be addressed in a better, more efficient, and more user-friendly manner than ever before, thanks to CLARIN. Most of the data that are used in the investigation could only be used by technical experts a few years ago but are now available to all linguists through a variety of easily accessible web applications developed in CLARIN with interfaces dedicated to their intended users. However, it also shows that still a lot of further extensions and improvements can and must be made. Fortunately, most of these are being implemented in currently running projects.


Inga nyckelord är tillgängliga


[Augustinus et al.2012] Liesbeth Augustinus, Vincent Vandeghinste, and Frank Van Eynde. 2012. Example-based treebank querying. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?gur Dogan, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May. European Language Resources Association (ELRA).

[MacWhinney2015] Brian MacWhinney. 2015. Tools for analyzing talk, electronic edition, part 1: The CHAT transcription format. Technical report, Carnegie Mellon University, Pittsburg, PA, April27. http://childes.psy.cmu.edu/manuals/CHAT.pdf.

[Odijk2011] Jan Odijk. 2011. User scenario search. internal CLARIN-NL document, http://www.clarin.nl/node/166, April 13.

[Odijk2014a] Jan Odijk. 2014a. CLARIN: What’s in it for linguists?, March 27. Uilendag Lecture, Utrecht, http://dspace.library.uu.nl/handle/1874/295277.

[Odijk2014b] Jan Odijk. 2014b. Discovering resources in CLARIN: Problems and suggestions for solutions. unpublished article, Utrecht University, http://dspace.library.uu.nl/handle/1874/303788, August.

[Odijk2015a] Jan Odijk. 2015a. Linguistic research with PaQu. Computational Linguistics in the Netherlands Journal, 5:3–14, December.

[Odijk2015b] Jan Odijk. 2015b. Linguistic research with PaQu. Lecture held at CLIN 2015, Antwerp, http://www.clarin.nl/sites/default/files/Poster%20Odijk%20CLIN%202015%202015-02-02.pdf, February 6.

[Oostdijk et al. 2002] N. Oostdijk, W. Goedertier, F. Van Eynde, L. Boves, J.P. Martens, M. Moortgat, and H. Baayen. 2002. Experiences from the Spoken Dutch Corpus project. In M. González Rodriguez and C. Paz Suárez Araujo, editors, Proceedings of the third International Conference on Language Resources and Evaluation (LREC-2002), pages 340–347. ELRA, Las Palmas.

[Oostdijk et al. 2013] N. Oostdijk, M. Reynaert, V. Hoste, and I. Schuurman. 2013. The construction of a 500 million word reference corpus of contemporary written Dutch. In Peter Spyns and Jan Odijk, editors, Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme, pages 219–247. Springer, Berlin. http://link.springer.com/book/10.1007/978-3-642-30910-6/page/1.

[Spyns and Odijk2013] P. Spyns and Jan Odijk. 2013. Essential Speech and Language Technology for Dutch. Results by the STEVIN-programme. Springer. http://link.springer.com/book/10.1007/978-3-642-30910-6/page/1.

[Tjong Kim Sang et al.2010] Erik Tjong Kim Sang, Gosse Bouma, and Gertjan van Noord. 2010. LASSY for beginners. Presentation at CLIN 2010, Utrecht, February 5.

[van den Bosch et al.2007] A. van den Bosch, G.J. Busser, W. Daelemans, and S. Canisius. 2007. An efficient memory-based morphosyntactic tagger and parser for Dutch. In F. Van Eynde, P. Dirix, I. Schuurman, and V. Vandeghinste, editors, Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, pages 99–114. Leuven, Belgium.

[van der Beek et al.2002] Leonoor van der Beek, Gosse Bouma, and Gertjan van Noord. 2002. Een brede computationele grammatica voor het Nederlands. Nederlandse Taalkunde, 7:353–374.

[van Gompel and Reynaert2013] Maarten van Gompel and Martin Reynaert. 2013. FoLiA: A practical XML format for linguistic annotation - a descriptive and comparative study. Computational Linguistics in the Netherlands Journal, 3:63–81, 12/2013.

[van Noord et al.2013] Gertjan van Noord, Gosse Bouma, Frank Van Eynde, Dani¨el de Kok, Jelmer van der Linde, Ineke Schuurman, Erik Tjong Kim Sang, and Vincent Vandeghinste. 2013. Large scale syntactic annotation of written Dutch: Lassy. In Peter Spyns and Jan Odijk, editors, Essential Speech and Language Technology for Dutch, Theory and Applications of Natural Language Processing, pages 147–164. Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-30910-6_9.

[Vandeghinste and Augustinus2014] Vincent Vandeghinste and Liesbeth Augustinus. 2014. Making large treebanks searchable. The SoNaR case. In Marc Kupietz, Hanno Biber, Harald L¨ungen, Piotr Ba´nski, Evelyn Breiteneder, Karlheinz M¨orth, Andreas Witt, and Jani Takhsha, editors, Proceedings of the LREC2014 2nd workshop on Challenges in the management of large corpora (CMLC-2), pages 15–20. ELRA, Reykjavik. http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-CMLC2%20Proceedings-rev2.pdf.

Citeringar i Crossref