Konferensartikel

Morphosyntactic Disambiguation in an Endangered Language Setting

Jeff Ens
School of Interactive Arts & Technology, Simon Fraser University, Canada

Mika Hämäläinen
Department of Digital Humanities, University of Helsinki, Finland

Jack Rueter
Department of Digital Humanities, University of Helsinki, Finland

Philippe Pasquier
School of Interactive Arts & Technology, Simon Fraser University, Canada

Ladda ner artikel

Ingår i: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:39, s. 345--349

NEALT Proceedings Series 42:39, s. 345--349

Visa mer +

Publicerad: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Endangered Uralic languages present a high variety of inflectional forms in their morphology. This results in a high number of homonyms in inflections, which introduces a lot of morphological ambiguity in sentences. Previous research has employed constraint grammars to address this problem, however CGs are often unable to fully disambiguate a sentence, and their development is labour intensive. We present an LSTM based model for automatically ranking morphological readings of sentences based on their quality. This ranking can be used to evaluate the existing CG disambiguators or to directly morphologically disambiguate sentences. Our approach works on a morphological abstraction and it can be trained with a very small dataset.

Nyckelord

Disambiguation FST LSTM CG Uralic languages

Referenser

Inga referenser tillgängliga

Citeringar i Crossref