The Corpus of American Norwegian Speech (CANS)

Janne Bondi Johannessen
The Text Laboratory & MultiLing, University of Oslo, Blindern, Oslo, Norway

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:40, s. 297-300

NEALT Proceedings Series 23:40, p. 297-300

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper contains a description of the Corpus of American Norwegian Speech, a new tool for heritage language research. We present the background for its existence, the linguistic contents and its main technical features. The demonstration will show the corpus in use, focussing on problems that are specific to heritage language research, and how the corpus can be searched to provide relevant data.


Inga nyckelord är tillgängliga


Johannessen, Janne Bondi, Lars Nygaard, Joel Priestley, and Anders Nøklestad. 2008. Glossa: a Multilingual, Multimodal, Configurable User Interface. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). Paris: European Language Resources Association (ELRA). http://www.hf.uio.no/iln/tjenester/kunnskap/sprak/glossa/LRECglossa_2008.pdf

Johannessen, Janne Bondi, Joel Priestley, Kristin Hagen, Anders Nøklestad, and Andre Lynum. 2012. The Nordic Dialect Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation. European Language Resources Association, p. 3388-3391. http://dblp.unitrier.de/db/conf/lrec/lrec2012.html

Johannessen, Janne Bondi, Øystein Alexander Vangsnes, Joel Priestley, Kristin Hage,. 2014. A multilingual speech corpus of North-Germanic languages. In Raso, Tommaso; Mello, Heliana (eds.): Spoken Corpora and Linguistic Studies. John Benjamins Publishing Company, p. 69-83. https://www.benjamins.com/#catalog/books/scl.61.02joh/fulltext

Norsk Lingvistisk Tidsskrift [Norwegian Linguistics Journal]. 2012. Special issue on the Norwegian Language in America (edited by Janne Bondi Johannessen and Joe Salmons).

Rothman, Jason. 2009. Understanding the Nature and Outcomes of Early Bilingualism: Romance Languages as Heritage Languages. The International Journal of Bilingualism 13: 155-163.

Schmid, Helmut. 1995. Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.

Schmid, Helmut. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.

Søfteland, Åshild and Anders Nøklestad. 2008. Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger. In Johannessen, Janne Bondi og Kristin Hagen (eds.) Språk i Oslo. Ny forskning omkring talespråk. Novus, Oslo.

Web sites

5th Annual Workshop on Immigrant Languages in the Americas, UCLA, October 17-19, 2014. http://tekstlab.uio.no/WILA5/index.html

CLARIN: http://www.clarin.eu/Clarino: http://clarin.b.uib.no/

Corpus of American Norwegian Speech (CANS): http://tekstlab.uio.no/glossa/html/?corpus=amerikanorsk

DialectTransliterator: http://omilia.uio.no/scandiasyn/translit/

Elan: https://tla.mpi.nl/tools/tlatools/elan/

Glossa corpus search and processing tool: http://www.hf.uio.no/iln/english/about/organization/textlaboratory/services/glossa.html

Oslo-Bergen Tagger: http://tekstlab.uio.no/obtny/english/index.html

Talko: http://www.sls.fi/doc.php?category=2&docid=943

Text Laboratory: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/

TreeTagger: http://www.ims.unistuttgart.de/projekte/corplex/TreeTagger/

Transcriber: http://trans.sourceforge.net/en/presentation.php

Citeringar i Crossref