Conference article

The Corpus of American Norwegian Speech (CANS)

Janne Bondi Johannessen
The Text Laboratory & MultiLing, University of Oslo, Blindern, Oslo, Norway

Download article

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:40, p. 297-300

NEALT Proceedings Series 23:40, p. 297-300

Show more +

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)


This paper contains a description of the Corpus of American Norwegian Speech, a new tool for heritage language research. We present the background for its existence, the linguistic contents and its main technical features. The demonstration will show the corpus in use, focussing on problems that are specific to heritage language research, and how the corpus can be searched to provide relevant data.


No keywords available


Johannessen, Janne Bondi, Lars Nygaard, Joel Priestley, and Anders Nøklestad. 2008. Glossa: a Multilingual, Multimodal, Configurable User Interface. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). Paris: European Language Resources Association (ELRA).

Johannessen, Janne Bondi, Joel Priestley, Kristin Hagen, Anders Nøklestad, and Andre Lynum. 2012. The Nordic Dialect Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation. European Language Resources Association, p. 3388-3391.

Johannessen, Janne Bondi, Øystein Alexander Vangsnes, Joel Priestley, Kristin Hage,. 2014. A multilingual speech corpus of North-Germanic languages. In Raso, Tommaso; Mello, Heliana (eds.): Spoken Corpora and Linguistic Studies. John Benjamins Publishing Company, p. 69-83.

Norsk Lingvistisk Tidsskrift [Norwegian Linguistics Journal]. 2012. Special issue on the Norwegian Language in America (edited by Janne Bondi Johannessen and Joe Salmons).

Rothman, Jason. 2009. Understanding the Nature and Outcomes of Early Bilingualism: Romance Languages as Heritage Languages. The International Journal of Bilingualism 13: 155-163.

Schmid, Helmut. 1995. Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.

Schmid, Helmut. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.

Søfteland, Åshild and Anders Nøklestad. 2008. Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger. In Johannessen, Janne Bondi og Kristin Hagen (eds.) Språk i Oslo. Ny forskning omkring talespråk. Novus, Oslo.

Web sites

5th Annual Workshop on Immigrant Languages in the Americas, UCLA, October 17-19, 2014.


Corpus of American Norwegian Speech (CANS):



Glossa corpus search and processing tool:

Oslo-Bergen Tagger:


Text Laboratory:



Citations in Crossref