Machine Learning for Rhetorical Figure Detection: More Chiasmus with Less Annotation

Marie Dubremetz
Dept. of Linguistics and Philology, Uppsala University, Uppsala, Sweden

Joakim Nivre
Dept. of Linguistics and Philology, Uppsala University, Uppsala, Sweden

Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:5, s. 37-45

NEALT Proceedings Series 29:5, s. 37-45

Publicerad: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Figurative language identification is a hard problem for computers. In this paper we handle a subproblem: chiasmus detection. By chiasmus we understand a rhetorical figure that consists in repeating two elements in reverse order: “First shall be last, last shall be first”. Chiasmus detection is a needle-in-the-haystack problem with a couple of true positives for millions of false positives. Due to a lack of annotated data, prior work on detecting chiasmus in running text has only considered hand-tuned systems. In this paper, we explore the use of machine learning on a partially annotated corpus. With only 31 positive instances and partial annotation of negative instances, we manage to build a system that improves both precision and recall compared to a hand-tuned system using the same features. Comparing the feature weights learned by the machine to those give by the human, we discover common characteristics of chiasmus.


