Sentence Compression For Automatic Subtitling

Juhani Luotolahti
Department of Information Technology, University of Turku, Finland

Filip Ginter
Department of Information Technology, University of Turku, Finland

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:18, s. 135-143

NEALT Proceedings Series 23:18, p. 135-143

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper investigates sentence compression for automatic subtitle generation using supervised machine learning. We present a method for sentence compression as well as discuss generation of training data from compressed Finnish sentences, and different approaches to the problem. The method we present outperforms state-of-the-art baseline in both automatic and human evaluation. On real data, 44.9% of the sentences produced by the compression algorithm have been judged to be useable as-is or after minor edits.


Inga nyckelord är tillgängliga


Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

HORI Chiori and Sadaoki Furui. 2004. Speech summarization: An approach through word extraction and a method for evaluation. IEICE TRANSACTIONS on Information and Systems, 87(1):15–25.

James Clarke and Mirella Lapata. 2006. Constraintbased sentence compression: An integer programming approach. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 144–151.

Trevor Cohn and Mirella Lapata. 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research, 34:637–674.

Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, pages 1–8.

Katja Filippova and Michael Strube. 2008. Dependency tree based sentence compression. In Proceedings of the Fifth International Natural Language Generation Conference, pages 25–32.

Michel Gagnon and Lyne Da Sylva. 2005. Text summarization by sentence extraction and syntactic pruning. In Proceedings of Computational Linguistics in the North East (CliNE05).

Katri Haverinen, Jenna Nyblom, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missil¨a, Stina Ojala, Tapio Salakoski, and Filip Ginter. 2014. Building the essential resources for Finnish: The Turku Dependency Treebank. Language Resources and Evaluation, 48(3):493–531.

Jenna Kanerva, Juhani Luotolahti, and Filip Ginter. 2014. Turku: Broad-coverage semantic parsing with rich features. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 678–682.

Kevin Knight and Daniel Marcu. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91–107, July.

Ryan T McDonald. 2006. Discriminative sentence compression with soft syntactic evidence. In Proceedings of the 11th conference of EACL, pages 297–304.

Naoaki Okazaki. 2007. CRFsuite: A fast implementation of Conditional Random Fields (CRFs).

Jenine Turner and Eugene Charniak. 2005. Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics(ACL), pages 290–297.

Anthony Viera and Joanne Garrett. 2005. Understanding interobserver agreement: The Kappa statistic. Family Medicine, 37(5):360–363.

Citeringar i Crossref