Konferensartikel

Machine Learning for Rhetorical Figure Detection: More Chiasmus with Less Annotation

Marie Dubremetz
Dept. of Linguistics and Philology, Uppsala University, Uppsala, Sweden

Joakim Nivre
Dept. of Linguistics and Philology, Uppsala University, Uppsala, Sweden

Ladda ner artikel

Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:5, s. 37-45

NEALT Proceedings Series 29:5, p. 37-45

Visa mer +

Publicerad: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Figurative language identification is a hard problem for computers. In this paper we handle a subproblem: chiasmus detection. By chiasmus we understand a rhetorical figure that consists in repeating two elements in reverse order: “First shall be last, last shall be first”. Chiasmus detection is a needle-in-the-haystack problem with a couple of true positives for millions of false positives. Due to a lack of annotated data, prior work on detecting chiasmus in running text has only considered hand-tuned systems. In this paper, we explore the use of machine learning on a partially annotated corpus. With only 31 positive instances and partial annotation of negative instances, we manage to build a system that improves both precision and recall compared to a hand-tuned system using the same features. Comparing the feature weights learned by the machine to those give by the human, we discover common characteristics of chiasmus.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Michael Bendersky and David Smith. 2012. A Dictionary of Wisdom and Wit: Learning to Extract Quotable Phrases. In Proceedings of the NAACLHLT 2012 Workshop on Computational Linguistics for Literature, pages 69–77, Montr´eal, Canada. Association for Computational Linguistics.

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’reilly edition.

Avrim Blum and Tom Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100, New York, NY, USA. ACM.

Kyle Booten and Marti A Hearst. 2016. Patterns of Wisdom: Discourse-Level Style in Multi-Sentence Quotations. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1139–1144, San Diego, California, jun. Association for Computational Linguistics.

Sarah J. Clarke and Peter Willett. 1997. Estimating the recall performance of Web search engines. Proceedings of Aslib, 49(7):184–189.

Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice: International Edition, volume 54. Pearson Education.

Marie Dubremetz and Joakim Nivre. 2015. Rhetorical Figure Detection: the Case of Chiasmus. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature, pages 23–31, Denver, Colorado, USA. Association for Computational Linguistics.

Marie Dubremetz and Joakim Nivre. 2016. Syntax Matters for Rhetorical Structure: The Case of Chiasmus. In Proceedings of the Fifth Workshop on Computational Linguistics for Literature, pages pages 47–53, San Diego, California, USA. Association for Computational Linguistics.

Marie Dubremetz. 2013. Vers une identification automatique du chiasme de mots. In Actes de la 15e Rencontres des E´tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL’2013), pages 150–163, Les Sables d’Olonne, France.

Jonathan Dunn. 2013. What metaphor identification systems can tell us about metaphor-in-language. In Proceedings of the First Workshop on Metaphor in NLP, pages 1–10, Atlanta, Georgia. Association for Computational Linguistics.

Bernard Dupriez. 2003. Gradus, les procédés littéraires. Union Générale d’Éditions 10/18.

Pierre Fontanier. 1827. Les Figures du discours. Flammarion, 1977 edition.

Jakub J. Gawryjolek. 2009. Automated Annotation and Visualization of Rhetorical Figures. Master thesis, Universty of Waterloo.

Harald Horvei. 1985. The Changing Fortunes of a Rhetorical Term: The History of the Chiasmus. The Author.

Daniel Devatman Hromada. 2011. Initial Experiments with Multilingual Extraction of Rhetoric Figures by means of PERL-compatible Regular Expressions. In Proceedings of the Second Student Research Workshop associated with RANLP 2011, pages 85–90, Hissar, Bulgaria.

Philipp Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In The Tenth Machine Translation Summit, pages 79–86, Phuket, Thailand.

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David Mc-Closky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60.

Helge Nordahl. 1971. Variantes chiasmiques. Essai de description formelle. Revue Romane, 6:219–232.

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and ´ Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Alain Rabatel. 2008. Points de vue en confrontation dans les antimétaboles PLUS et MOINS. Langue française, 160(4):21–36.

Claus Walter Strommer. 2011. Using Rhetorical Figures and Shallow Attributes as a Metric of Intent in Text. Ph.D. thesis, University of Waterloo.

Tony Veale. 2011. Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24, pages 278–287, Portland, Oregon, USA. Association for Computational Linguistics.

John Woodland Welch. 1981. Chiasmus in Antiquity: Structures, Analyses, Exegesis. Reprint Series. Research Press.

David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pages 189–196, Stroudsburg, PA, USA. Association for Computational Linguistics.

Citeringar i Crossref