Ambiguity in Semantically Related Word Substitutions: an investigation in historical Bible translations

Maria Moritz
Institute of Computer Science, University of Goettingen, Germany

Marco Büchler
Institute of Computer Science, University of Goettingen, Germany

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Linköping Electronic Conference Proceedings 133:5, s. 18-23

NEALT Proceedings Series 32:5, s. 18-23

Publicerad: 2017-05-10

ISBN: 978-91-7685-503-4

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Text reuse is a common way to transfer historical texts. It refers to the repetition of text in a new context and ranges from nearverbatim (literal) and para-phrasal reuse to completely non-literal reuse (e.g., allusions or translations). To improve the detection of reuse in historical texts, we need to better understand its characteristics. In this work, we investigate the relationship between para-phrasal reuse and word senses. Specifically, we investigate the conjecture that words with ambiguous word senses are less prone to replacement in para-phrasal text reuse. Our corpus comprises three historical English Bibles, one of which has previously been annotated with word senses. We perform an automated word-sense disambiguation based on supervised learning. By investigating our conjecture we strive to understand whether unambiguous words are rather used for word replacements when a text reuse happens, and consequently, could serve as a discriminating feature for reuse detection.


