
Visualisation in speech corpora: maps and waves in the Glossa system

Michal Kosek
The Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway

Anders Nøklestad
The Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway

Joel Priestley
The Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway

Kristin Hagen
The Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway

Janne Bondi Johannessen
The Text Laboratory, Department of Linguistics and Scandinavian Studies, University of Oslo, Oslo, Norway

Ladda ner artikel

Ingår i: Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 111:4, s. 23-31

NEALT Proceedings Series 25:4, p. 23-31

Visa mer +

Publicerad: 2015-05-07

ISBN: 978-91-7519-035-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present the Glossa web-based system for corpus search and results handling, focussing on two modes of visualisation implemented in the system. First, we describe the use of maps to show the geographical distribution of search results and its utility for exploring dialectal variation and discovering new isoglosses. Secondly, we present a functionality for speech visualisation, yielding dynamically generated representations of spectrograms, pitch and formants. The analyses are accompanied by the ability to replay selected parts of the waveform, as well as export and compare maximum, minimum and average values of the parameters for different selections. Among other things, this can be used to explore in more detail the set of spoken variants revealed by the geographical map view.


speech visualisation; geovisualisation; speech corpora


Sjef Barbiers et al. 2006. Dynamic Syntactic Atlas of the Dutch dialects (DynaSAND). Meertens Institute, Amsterdam. http://www.meertens.knaw.

Paul Boersma and David Weenink. 2001. Praat, a system for doing phonetics by computer. Glot International, 5(9/10):341–345.

Ryan Chamberlain and Jennifer Schommer. 2014. Using Docker to support reproducible research. Technical report, Invenshure, LLC. http://dx.doi.org/10.6084/m9.figshare.1101910.

Oliver Christ. 1994. A modular and flexible architecture for an integrated corpus query system. In Proceedings of the 3rd International Conference on Computational Lexicography (COMPLEX), pages 22–32, Budapest.

Stefan Evert and Andrew Hardie. 2011. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics 2011 conference, Birmingham. University of Birmingham.

Janne Bondi Johannessen and Øystein Alexander Vangsnes. 2014. Nordic Atlas of Language Structures Journal. Department of Linguistics and Scandinavian Studies, University of Oslo. http://www.tekstlab.uio.no/nals/.

Janne Bondi Johannessen, Joel Priestley, Kristin Hagen, Tor Anders A° farli, and Øystein Alexander Vangsnes. 2009. The Nordic Dialect Corpus – An advanced research tool. In Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. NEALT proceedings series, volume 4.

William Labov, Sharon Ash, and Charles Boberg. 2005. The atlas of North American English: Phonetics, phonology and sound change. Walter de Gruyter.

William Labov. 1972. Sociolinguistic patterns. Number 4 in Conduct and Communication. University of Pennsylvania Press.

Wai-Sum Lee. 2005. A phonetic study of the “er-hua” rimes in Beijing Mandarin. In Ninth European Conference on Speech Communication and Technology.

Eric Papazian and Botolv Helleland. 2005. Norsk talemål. Hyskoleforlaget, Kristiansand. Jürgen Erich Schmidt, Joachim Herrgen, Tanja Giessler, Alfred Lameli, Alexandra Lenz, Karl-Heinz M¨uller, Wolfgang N¨aser, Jost Nickel, Roland Kehrein, Christoph Purschke, et al. 2001. Digitaler Wenker-Atlas. Forschungszentrum Deutscher Sprachatlas, Marburg. http://www.diwa.info.

Chilin Shih. 1997. Mandarin third tone sandhi and prosodic structure. Studies in Chinese Phonology, 20:81–123.

Morgan Sonderegger and Joseph Keshet. 2012. Automatic measurement of voice onset time using discriminative structured prediction. The Journal of the Acoustical Society of America, 132(6):3965–3979.

BryceWiedenbeck and Kit La Touche. 2008. Drawing isoglosses algorithmically. In Class of 2008 Senior Conference on Computational Geometry, page 22.

Jie Zhang and Jiang Liu. 2011. Tone sandhi and tonal coarticulation in Tianjin Chinese. Phonetica, 68:161–191.

Long Zhang, Haifeng Li, Lin Ma, and Jianhua Wang. 2014. Automatic detection and evaluation of Erhua in the Putonghua proficiency test. Chinese Journal of Acoustics, 1:83–96.

Citeringar i Crossref