Conference article

Statistical Machine Translation with Readability Constraints

Sara Stymne
Uppsala University, Department of Linguistics and Philology, Uppsala, Sweden

Jörg Tiedemann
Uppsala University, Department of Linguistics and Philology, Uppsala, Sweden

Christian Hardmeier
Uppsala University, Department of Linguistics and Philology, Uppsala, Sweden

Joakim Nivre
Uppsala University, Department of Linguistics and Philology, Uppsala, Sweden

Download article

Published in: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:34, p. 375-386

NEALT Proceedings Series 16:34, p. 375-386

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper presents experiments with document-level machine translation with readability constraints. We describe the task of producing simplified translations from a given source with the aim to optimize machine translation for specific target users such as language learners. In our approach; we introduce global features that are known to affect readability into a documentlevel SMT decoding framework. We show that the decoder is capable of incorporating those features and that we can influence the readability of the output as measured by common metrics. This study presents the first attempt of jointly performing machine translation and text simplification; which is demonstrated through the case of translating parliamentary texts from English to Swedish.

Keywords

Machine Translation; Text Simplification; Readability

References

Aziz; W.; de Sousa; S. C. M.; and Specia; L. (2012). Cross-lingual sentence compression for subtitles. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation; pages 103–110; Trento; Italy.

Björnsson; C. H. (1968). Läsbarhet. Liber; Stockholm.

Carpuat; M. (2009). One translation per discourse. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009); pages 19–27; Boulder; Colorado.

Carpuat; M. and Simard; M. (2012). The trouble with SMT consistency. In Proceedings of the Seventh Workshop on Statistical Machine Translation; pages 442–449; Montréal; Canada.

Carpuat; M. and Wu; D. (2007). Improving statistical machine translation using word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; pages 61–72; Prague; Czech Republic.

Chall; J. S. (1958). Readability: An appraisal of research and application. Columbus : Bureau of Educational Research; Columbus; Ohio; USA.

Chan; Y. S.; Ng; H. T.; and Chiang; D. (2007). Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL; pages 33–40; Prague; Czech Republic.

Cohn; T. and Lapata; M. (2009). Sentence compression as tree transduction. Journal of Artificial Intelligence Research; 34:637–674.

Daelemans; W.; Höthker; A.; and Sang; E. T. K. (2004). Automatic sentence simplification for subtitling in Dutch and English. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04); pages 1045–1048; Lisbon; Portugal.

Deléger; L.; Merkel; M.; and Zweigenbaum; P. (2006). Enriching medical terminologies: an approach based on aligned corpora. In International Congress of the European Federation for Medical Informatics; pages 747–752; Maastricht; The Netherlands.

Doddington; G. (2002). Automatic evaluation of machine translation quality using n-gram cooccurence
statistics. In Proceedings of the Second International Conference on Human Language Technology; pages 228–231; San Diego; California; USA.

Ganitkevitch; J.; Callison-Burch; C.; Napoles; C.; and Durme; B. V. (2011). Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing; pages 1168–1179; Edinburgh; Scotland.

Genzel; D.; Uszkoreit; J.; and Och; F. (2010). "Poetic" statistical machine translation: Rhyme and meter. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing; pages 158–166; Cambridge; Massachusetts; USA.

Hardmeier; C.; Nivre; J.; and Tiedemann; J. (2012). Document-wide decoding for phrasebased statistical machine translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; pages 1179–1190; Jeju Island; Korea.

Knight; K. and Marcu; D. (2000). Statistics-based summarization — Step one: Sentence compression. In National Conference on Artificial Intelligence (AAAI); pages 703–710; Austin; Texas; USA.

Koehn; P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit X; pages 79–86; Phuket; Thailand.

Koehn; P.; Hoang; H.; Birch; A.; Callison-Burch; C.; Federico; M.; Bertoldi; N.; Cowan; B.; Shen; W.; Moran; C.; Zens; R.; Dyer; C.; Bojar; O.; Constantin; A.; and Herbst; E. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL; Demo and Poster Sessions; pages 177–180; Prague; Czech Republic.

Koehn; P.; Och; F. J.; and Marcu; D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the NAACL; pages 48–54; Edmonton; Alberta; Canada.

Margarido; P.; Pardo; T.; Antonio; G.; Fuentes; V.; Aluísio; S.; and Fortes; R. (2008). Automatic summarization for text simplification: Evaluating text understanding by poor readers. In Anais do VI Workshop em Tecnologia da Informação e da Linguagem Humana; pages 310–315; Vila Velha; Brazil.

Mühlenbock; K. and Kokkinakis; S. J. (2009). LIX 68 revisited – an extended readability. In Proceedings of the Corpus Linguistics Conference; Liverpool; UK. Och; F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of the 42nd Annual Meeting of the ACL; pages 160–167; Sapporo; Japan.

Och; F. J.; Ueffing; N.; and Ney; H. (2001). An efficient A* search algorithm for Statistical Machine Translation. In Proceedings of the ACL 2001 Workshop on Data-Driven Machine Translation; pages 55–62; Toulouse; France.

Papineni; K.; Roukos; S.; Ward; T.; and Zhu; W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the ACL; pages 311–318; Philadelphia; Pennsylvania; USA.

Smith; C. and Jönsson; A. (2011). Automatic summarization as means of simplifying texts; an evaluation for Swedish. In Proceedings of the 18th Nordic Conference on Computational Linguistics (NODALIDA’11); Riga; Latvia.

Specia; L. (2010). Translating from complex to simplified sentences. In Proceedings of the 9th International Conference on Computational Processing of the Portuguese Language; 9th International Conference (PROPOR’10); pages 30–39; Porto Alegre; Brazil.

Stolcke; A. (2002). SRILM – an extensible language modeling toolkit. In Proceedings of the Seventh International Conference on Spoken Language Processing; pages 901–904; Denver; Colorado; USA.

Stymne; S. (2011). Blast: A tool for error analysis of machine translation output. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies; Demonstration session; Portland; Oregon; USA.

Stymne; S. and Smith; C. (2012). On the interplay between readability; summarization and MTranslatability. In Proceedings of the 4th Swedish Language Technology Conference; pages 71–72; Lund; Sweden.

Tiedemann; J. (2010). Context adaptation in statistical machine translation using models with exponentially decaying cache. In Proceedings of the ACL 2010 Workshop on Domain Adaptation for Natural Language Processing (DANLP); pages 8–15; Uppsala; Sweden.

Citations in Crossref