Conference article

Tagging a Norwegian Dialect Corpus

Andre Kåsen
Department of Informatics, University of Oslo, Norway

Kristin Hagen
The Text Laboratory, University of Oslo, Norway

Anders Nøklestad
The Text Laboratory, University of Oslo, Norway

Joel Priestley
The Text Laboratory, University of Oslo, Norway

Download article

Published in: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:40, p. 350--355

NEALT Proceedings Series 42:40, p. 350--355

Show more +

Published: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper describes an evaluation of five data-driven part-of-speech (PoS) taggers for spoken Norwegian. The taggers all rely on different machine learning mechanisms: decision trees, hidden Markov models (HMMs), conditional random fields (CRFs), long-short term memory networks (LSTMs), and convolutional neural networks (CNNs). We go into some of the challenges posed by the task of tagging spoken, as opposed to written, language, and in particular a wide range of dialects as is found in the recordings of the LIA (Language Infrastructure made Accessible) project. The results show that the taggers based on either conditional random fields or neural networks perform much better than the rest, with the LSTM tagger getting the highest score.

Keywords

part-of-speech tagging spoken language dialects Norwegian

References

No references available

Citations in Crossref