Article | Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland | Projecting named entity recognizers without annotated or parallel corpora Linköping University Electronic Press Conference Proceedings
Title:
Projecting named entity recognizers without annotated or parallel corpora
Author:
Jue Hou: Department of Computer Science, University of Helsinki, Finland Maximilian W. Koppatz: Department of Computer Science, University of Helsinki, Finland José María Hoya Quecedo: Department of Computer Science, University of Helsinki, Finland Roman Yangarber: Department of Computer Science, University of Helsinki, Finland
Year:
2019
Conference:
Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland
Issue:
167
Article no.:
024
Pages:
232--241
No. of pages:
9
Publication type:
Abstract and Fulltext
Published:
2019-10-02
ISBN:
978-91-7929-995-8
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:

Named entity recognition (NER) is a well-researched task in the field of NLP, which typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer with no annotated or parallel documents, by leveraging strong NER models that exist for English. We automatically gather a large amount of {\em chronologically matched} data in two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with limited linguistic rules. We use this artificially’’ annotated data to train a BiLSTM-CRF model. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance.

Keywords: Automatic data annotation Named Entity Recognition Neural Network

