Mathias Creutz
Department of Digital Humanities, Faculty of Arts, University of Helsinki, Finland
Eetu Eetu Sjöblom
Department of Digital Humanities, Faculty of Arts, University of Helsinki, Finland
Download articlePublished in: Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku Finland
Linköping Electronic Conference Proceedings 164:3, p. 20-30
NEALT Proceedings Series 39:3, p. 20-30
Published: 2019-09-30
ISBN: 978-91-7929-998-9
ISSN: 1650-3686 (print), 1650-3740 (online)
It is important for language learners to
practice speaking and writing in realistic
scenarios. The learners also need feed-back on how to express themselves better
in the new language. In this paper, we perform automatic paraphrase generation on
language-learner texts. Our goal is to devise tools that can help language learners
write more correct and natural sounding
sentences. We use a pivoting method with
a character-based neural machine translation system trained on subtitle data to paraphrase and improve learner texts that contain grammatical errors and other types of
noise. We perform experiments in three
languages: Finnish, Swedish and English.
We experiment with monolingual data as
well as error-augmented monolingual and
bilingual data in addition to parallel subtitle data during training. Our results show
that our baseline model trained only on
parallel bilingual data sets is surprisingly
robust to different types of noise in the
source sentence, but introducing artificial
errors can improve performance. In addition to error correction, the results show
promise for using the models to improve
fluency and make language-learner texts
more idiomatic.
Paraphrasing, Grammatical error correction, Neural machine translation, Learner language, Multilinguality, Idiomatic expressions