Conference article

Spokes - a search and exploration service for conversational corpus data

Piotr Pezik
University of Lodz, Corpus & Computational Linguistics Laboratory, Poland

Download article

Published in: Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands

Linköping Electronic Conference Proceedings 116:9, p. 99-109

Show more +

Published: 2015-08-26

ISBN: 978-91-7685-954-4

ISSN: 1650-3686 (print), 1650-3740 (online)


Spokes is an online service for conversational corpus data search and exploration, currently developed as part of CLARIN-PL – the Polish CLARIN infrastructure. This paper describes the data sets currently available through Spokes, the architecture of the service and the data and metadata search functionality it provides to its users. We also introduce some of the more experimental features which have been developed to facilitate more advanced research on multimodal conversational corpora.


conversational corpora;multimedia corpus search engine;CLARIN-PL


Boersma2002.. Paul Boersma. 2002. Praat, a system for doing phonetics by computer. Glot international, 5(9/10):341–345.

Bolinger1986. Dwight Bolinger. 1986. Intonation and its parts: Melody in spoken English. Stanford University Press.

Coleman et al.2012. John Coleman, Ladan Baghai-Ravary, John Pybus, and Sergio Grau. 2012. Audio BNC: the audio edition of the Spoken British National Corpus.

Douglas2003. Fiona M Douglas. 2003. The scottish corpus of texts and speech: Problems of corpus design. Literary and linguistic computing, 18(1):23–37.

Du Bois et al.2000. John W. Du Bois, Wallace L. Chafe, Charles Meyer, Sandra A. Thompson, Robert Englebretson, and Nii Martey. 2000. Santa Barbara corpus of spoken American English.

Evert2004. Stefan Evert. 2004. The statistics of word cooccurrences. Ph.D. thesis, PhD Dissertation, Stuttgart University.

Freitas and Santos2008. Tiago Freitas and Fabíola Santos. 2008. Corp-oral: Spontaneous speech corpus for european portuguese. In Proceedings of LREC.

Gasch2010. Joachim Gasch. 2010. Dgd 2.0: A web-based navigation platform for the visualization, presentation and retrieval of german speech corpora. Sprache und Datenverarbeitung, 34(1):27–38.

Hirschberg and Pierrehumbert1986. Julia Hirschberg and Janet Pierrehumbert. 1986. The intonational structuring of discourse. In Proceedings of the 24th annual meeting on Association for Computational Linguistics, pages 136–144. Association for Computational Linguistics.

Johannessen et al.2009. Janne Bondi Johannessen, Joel Priestley, Kristin Hagen, Tor Anders Åfarli, and Øystein Alexander Vangsnes. 2009. The nordic dialect corpus-an advanced research tool. In Proceedings of the 17th Nordic conference of computational linguistics NODALIDA 2009. NEALT proceedings series, volume 4, pages 73–80.

Müller2007. Meinard Müller. 2007. Dynamic time warping. Information retrieval for music and motion, pages 69–84.

Pezik2012] Piotr Pezik. 2012. Jezyk mówiony w NKJP. In Adam Przepiórkowski, Miroslaw Bánko, Rafal Górski, and Barbara Lewandowska-Tomaszczyk, editors, Narodowy Korpus J?ezyka Polskiego, pages 37–47. Wydawnictwo Naukowe PWN, Warszawa.

Walinski and P?ezik2007. Jacek Walinski and Piotr P?ezik. 2007. Web access interface to the PELCRA referential corpus of polish. pages 65–86. Lang.

Wells and others1997. John C Wells et al. 1997. Sampa computer readable phonetic alphabet. Handbook of standards and resources for spoken language systems, 4.

Wittenburg et al.2006. Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. Elan: a professional framework for multimodality research. In Proceedings of LREC, volume 2006.

Citations in Crossref