Konferensartikel

Named-Entity Recognition for Norwegian

Bjarte Johansen
Digital Centre of Excellence, Equinor ASA, Stavanger, Norway

Ladda ner artikel

Ingår i: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:23, s. 222--231

NEALT Proceedings Series 42:23, p. 222--231

Visa mer +

Publicerad: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

NER is the task of recognizing and demarcating the segments of a document that are part of a name and which type of name it is. We use 4 different categories of names: Locations (LOC), miscellaneous (MISC), organizations (ORG), and persons (PER). Even though we employ state of the art methods---including sub-word embeddings---that work well for English, we are unable to reproduce the same success for the Norwegian written forms. However, our model performs better than any previous research on Norwegian text. The study also presents the first NER for Nynorsk. Lastly, we find that by combining Nynorsk and Bokmål into one training corpus we improve the performance of our model on both languages.

Nyckelord

Natural Language Processing NLP Named-Entity Recognition NER Named Entity Norwegian Nynorsk Bokmål Deep learning

Referenser

Inga referenser tillgängliga

Citeringar i Crossref