Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content

José Carlos Rosales Nuñez
Université Paris Sud, LIMSI, France / Université Paris Saclay, France / INRIA Paris, France

Djamé Seddah
INRIA Paris, France

Guillaume Wisniewski
Université Paris Sud, LIMSI, France / Université Paris Saclay, Franc

Ladda ner artikel

Ingår i: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:1, s. 2--14

NEALT Proceedings Series 42:1, p. 2--14

Visa mer +

Publicerad: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB- SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English. We show that, contrary to what could be expected, PBSMT outperforms NMT when translating non-canonical inputs. Our error analysis uncovers the speci- ficities of UGC that are problematic for sequential NMT architectures and suggests new avenue for improving NMT models.


Machine Translation User Generated Content Neural Machine Translation PBSMT


Inga referenser tillgängliga

Citeringar i Crossref