Janne Bondi Johannessen
The Text Laboratory & MultiLing, University of Oslo, Blindern, Oslo, Norway
Download articlePublished in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Linköping Electronic Conference Proceedings 109:40, p. 297-300
NEALT Proceedings Series 23:40, p. 297-300
Published: 2015-05-06
ISBN: 978-91-7519-098-3
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper contains a description of the Corpus of American Norwegian Speech, a new tool for heritage language research. We present the background for its existence, the linguistic contents and its main technical features. The demonstration will show the corpus in use, focussing on problems that are specific to heritage language research, and how the corpus can be searched to provide relevant data.
Johannessen, Janne Bondi, Lars Nygaard, Joel Priestley, and Anders Nøklestad. 2008. Glossa: a Multilingual, Multimodal, Configurable User Interface. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). Paris: European Language Resources Association (ELRA). http://www.hf.uio.no/iln/tjenester/kunnskap/sprak/glossa/LRECglossa_2008.pdf
Johannessen, Janne Bondi, Joel Priestley, Kristin Hagen, Anders Nøklestad, and Andre Lynum. 2012. The Nordic Dialect Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation. European Language Resources Association, p. 3388-3391.
http://dblp.unitrier.de/db/conf/lrec/lrec2012.html
Johannessen, Janne Bondi, Øystein Alexander Vangsnes, Joel Priestley, Kristin Hage,. 2014. A multilingual speech corpus of North-Germanic languages. In Raso, Tommaso; Mello, Heliana (eds.): Spoken Corpora and Linguistic Studies. John Benjamins Publishing Company, p. 69-83.
https://www.benjamins.com/#catalog/books/scl.61.02joh/fulltext
Norsk Lingvistisk Tidsskrift [Norwegian Linguistics Journal]. 2012. Special issue on the Norwegian Language in America (edited by Janne Bondi Johannessen and Joe Salmons).
Rothman, Jason. 2009. Understanding the Nature and Outcomes of Early Bilingualism: Romance Languages as Heritage Languages. The International Journal of Bilingualism 13: 155-163.
Schmid, Helmut. 1995. Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop.
Dublin, Ireland.
Schmid, Helmut. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.
Søfteland, Åshild and Anders Nøklestad. 2008. Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger. In Johannessen, Janne Bondi og Kristin Hagen (eds.) Språk i Oslo. Ny forskning omkring talespråk. Novus, Oslo.
Web sites
5th Annual Workshop on Immigrant Languages in the Americas, UCLA, October 17-19, 2014. http://tekstlab.uio.no/WILA5/index.html
CLARIN: http://www.clarin.eu/Clarino: http://clarin.b.uib.no/
Corpus of American Norwegian Speech (CANS):
http://tekstlab.uio.no/glossa/html/?corpus=amerikanorsk
DialectTransliterator: http://omilia.uio.no/scandiasyn/translit/
Elan: https://tla.mpi.nl/tools/tlatools/elan/
Glossa corpus search and processing tool: http://www.hf.uio.no/iln/english/about/organization/textlaboratory/services/glossa.html
Oslo-Bergen
Tagger: http://tekstlab.uio.no/obtny/english/index.html
Talko: http://www.sls.fi/doc.php?category=2&docid=943
Text Laboratory: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
TreeTagger: http://www.ims.unistuttgart.de/projekte/corplex/TreeTagger/
Transcriber: http://trans.sourceforge.net/en/presentation.php