Konferensartikel

Large-Scale Contextualised Language Modelling for Norwegian

Andrey Kutuzov

Jeremy Barnes

Erik Velldal

Lilja Øvrelid

Stephan Oepen

Ladda ner artikel

Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021.

Linköping Electronic Conference Proceedings 178:4, s. 30-40

NEALT Proceedings Series 45:4, p. 30-40

Visa mer +

Publicerad: 2021-05-21

ISBN: 978-91-7929-614-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu

Nyckelord

ELMo, BERT, Norwegian, pre-trained models, contextualized embeddings, Nordic language models

Referenser

Inga referenser tillgängliga

Citeringar i Crossref