Kevin Knight
ISI, University of Southern California, USA
Download articlePublished in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Linköping Electronic Conference Proceedings 109:1, p. xi
NEALT Proceedings Series 23:1, p. xi
Published: 2015-05-06
ISBN: 978-91-7519-098-3
ISSN: 1650-3686 (print), 1650-3740 (online)
It is well-known that natural language has built-in redundancy. By using context, we can often guess the next word or character in a text. Two practical communities have independently exploited this fact. First, automatic speech and translation researchers build language models to distinguish fluent from non-fluent outputs. Second, text compression researchers convert predictions into short encodings, to save disk space and bandwidth. I will explore what these two communities can learn from each others’ (interestingly different) solutions. Then I will look at the less-studied question of redundancy in bilingual text, addressing questions like "How well can we predict human translator behavior?" and "How much information does a human translator add to the original?" (This is joint work with Barret Zoph and Marjan Ghazvininejad.)