Konferensartikel

Toward automatic improvement of language produced by non-native language learners

Mathias Creutz
Department of Digital Humanities, Faculty of Arts, University of Helsinki, Finland

Eetu Eetu Sjöblom
Department of Digital Humanities, Faculty of Arts, University of Helsinki, Finland

Ladda ner artikel

Ingår i: Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 164:3, s. 20-30

NEALT Proceedings Series 39:3, s. 20-30

Visa mer +

Publicerad: 2019-09-30

ISBN: 978-91-7929-998-9

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

It is important for language learners to practice speaking and writing in realistic scenarios. The learners also need feed-back on how to express themselves better in the new language. In this paper, we perform automatic paraphrase generation on language-learner texts. Our goal is to devise tools that can help language learners write more correct and natural sounding sentences. We use a pivoting method with a character-based neural machine translation system trained on subtitle data to paraphrase and improve learner texts that contain grammatical errors and other types of noise. We perform experiments in three languages: Finnish, Swedish and English. We experiment with monolingual data as well as error-augmented monolingual and bilingual data in addition to parallel subtitle data during training. Our results show that our baseline model trained only on parallel bilingual data sets is surprisingly robust to different types of noise in the source sentence, but introducing artificial errors can improve performance. In addition to error correction, the results show promise for using the models to improve fluency and make language-learner texts more idiomatic.

Nyckelord

Paraphrasing, Grammatical error correction, Neural machine translation, Learner language, Multilinguality, Idiomatic expressions

Referenser

Inga referenser tillgängliga

Citeringar i Crossref