Parliamentary Corpora in the CLARIN infrastructure

Darja Fišer
Department of Translation, Faculty of Arts, University of Ljubljana, Department of Knowledge Technologies, Jožef Stefan Institute, Slovenia

Jakob Lenardic
Department of Translation, Faculty of Arts, University of Ljubljana, Slovenia

Ladda ner artikel

Ingår i: Selected papers from the CLARIN Annual Conference 2017, Budapest, 18–20 September 2017

Linköping Electronic Conference Proceedings 147:7, s. 75-85

Visa mer +

Publicerad: 2018-05-16

ISBN: 978-91-7685-273-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper gives an overview of the parliamentary records and corpora from CLARIN countries with a focus on an analysis of their availability through the CLARIN infrastructure. Based on the results of the survey we provide a comprehensive overview of the corpora as well as draw a list of recommendations to optimize the depositing and cataloguing of the corpora in the CLARIN repositories in order to make them readily accessible for researchers from different disciplines. We also analyse the recall and precision of simple and faceted search of parliamentary corpora in the Virtual Language Observatory.


parliamentary records parliamentary corpora resource accessibility


[Bayley et al. 2004] Paul Bayley, Cinzia Bevitori, Elisabetta Zoni. 2004. Threat and fear in parliamentary debates in Britain, Germany and Italy, Cross-Cultural Perspectives on Parliamentary Discourse, 185-236.

[Borin et al. 2016] Lars Borin, Markus Forsberg, Martin Hammarstedt, Dan Rosén, Roland Schäfer, Anne Schumacher. 2016. Sparv: Språkbanken’s corpus annotation pipeline infrastructure. http://www8.cs.umu.se/~johanna/sltc2016/abstracts/SLTC_2016_paper_31.pdf. Last accessed on 11 January 2018.

[Branco and Silva 2006] António Branco and João Silva. 2006. A Suite of Shallow Processing Tools for Portuguese: LX-Suite, In Proceedings of EACL2006 – 11th Conference of the European Chapter of the Association for Computational Linguistics, 179–182.

[Cheng 2015] Jennifer E Cheng. 2015. Islamophobia, Muslimophobia or racism? Parliamentary discourses on Islam and Muslims in debates on the minaret ban in Switzerland. http://journals.sagepub.com/doi/pdf/10.1177/0957926515581157.

[van Dijk 2010] Teun A. van Dijk. 2010. Political Identities in Parliamentary Debates. http://www.discourses.org/OldArticles/Political%20Identities%20in%20Parliamentary%20Debates.pdf.
[Généreux et al. 2012] Michel Généreux, Iris Hendrickx, Amália Mendes. 2012. “A Large Portuguese Corpus On-Line: Cleaning and Preprocessing.” Conference: Computational Processing of the Portuguese Language (PROPOR).

[Georgalidou 2017] Marianthi Georgalidou. 2017. Using the Greek parliamentary speech corpus for the study of aggressive political discourse. https://www.clarin.eu/sites/default/files/4-georgalidou.pdf.

[Hirst et al. 2014] Graeme Hirst, Vanessa Wei Feng, Christopher Cochrane, Nona Naderi. 2014. Argumentation, Ideology, and Issue Framing in Parliamentary Discourse. http://ceur-ws.org/Vol-1341/paper6.pdf.

[Jakubícek and Kovár 2010] Miloš Jakubícek, Vojtech Kovár. 2010. “CzechParl: Corpus of Stenographic Protocols from Czech Parliament”. In P. Sojka, A. Horák (eds.) RASLAN 2010 Recent Advances in Slavonic Natural Language Processing.

[Kapociute-Dzikiene et al. 2015] Jurgita Kapociute-Dzikiene, Andrius Utka, Ligita Šarkute. 2015. “Authorship attribution of internet comments with thousand candidate authors.” ICIST 2015 : 21st International Conference on Information and Software Technologies, 433-448. Springer International Publishing.

[Mandravickaite and Krilavicius 2015] Justina Mandravickaite, Tomas Krilavicius. 2015. Language usage of members of the Lithuanian Parliament considering their political orientation. Deeds and Days 64: 133-151.

[Meurer 2017] Paul Meurer. 2017. From LFG structures to dependency relations. Bergen Language and Linguistic Studies 8: 183-201.

[Marx and Schuth 2010] Maarten Marx and Anne Schuth. “DutchParl: The Parliamentary Documents in Dutch.” http://politicalmashup.nl/new/uploads/2010/03/lrecfinalversionlong.pdf. Last accessed on 7 January 2018.

[Norén and Snickars 2016] Fredrik Norén, Pelle Snickars. 2016. Distant Reading the History of Swedish Film Politics—in 4,500 Governmental SOU Reports. http://pellesnickars.se/2016/12/distant-reading-the-history-of-swedish-film-politics-in-4500-governmental-sou-reports/

[Odijk 2014] Jan Odijk. 2014. “Discovering Resources in CLARIN: Problems and Suggestions for Solutions.” http://www.clarin.nl/sites/default/files/Searching%20with%20the%20VLO.pdf. Last accessed on 11 January 2017.

[Ogrodniczuk 2012] Maciej Ogrodniczuk. 2012. “The Polish Sejm Corpus.” http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf. Last accessed on 8 January 2018.

[Oravect et al. 2014] Csaba Oravecz, Tamás Váradi, Bálint Sass. 2014. “The Hungarian Gigaword Corpus.” http://www.lrec-conf.org/proceedings/lrec2014/pdf/681_Paper.pdf. Last accessed on 10 January 2018.

[Pancur and Šorn 2016] Andrej Pancur, Mojca Šorn. 2016. Smart Big Data: use of Slovenian parliamentary papers in digital history, Prispevki za novejšo zgodovino, 56:3, 130-146.

[Rheault et al. 2015] Ludovic Rheault, Kaspar Beelen, Christopher Cochrane, Graeme Hirst. 2015. Measuring Emotion in Parliamentary Debates Using Methods of Natural Language Processing. http://www.cs.toronto.edu/pub/gh/Rheault-etal-CPSA-2015.pdf.

[Voutilainen 2017] Eero Voutilainen. 2017. Parliamentary Records as Data for Linguistic Discourse Studies. http://videolectures.net/clarinplusworkshop2017_voutilainen_studies/.

[Rayson et al. 2015] Paul Rayson, Alistair Baron, Scott Piao, Steve Wattam. 2015. “Large-scale Time-sensitive Semantic Analysis of Historical Corpora.” http://ucrel.lancs.ac.uk/samuels/papers/SAMUELS_ICAME36_Software_Demo_Handout.pdf. Last accessed on 7 January 2018.

[Sippl et al. 2016] Colin Sippl, Manuel Burghardt, Christian Wolff, Bettina Mielke. 2016. “Korpusbasierte Analyse österreichischer Parlamentsreden.”

Citeringar i Crossref