Normalizing Medieval German Texts: from rules to deep learning

Natalia Korchagina
Institute of Computational Linguistics, University of Zurich, Switzerland

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Linköping Electronic Conference Proceedings 133:4, s. 12-17

NEALT Proceedings Series 32:4, s. 12-17

Publicerad: 2017-05-10

ISBN: 978-91-7685-503-4

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The application of NLP tools to historical texts is complicated by a high level of spelling variation. Different methods of historical text normalization have been proposed. In this comparative evaluation I test the following three approaches to text canonicalization on historical German texts from 15 th –16 th centuries: rule-based, statistical machine translation, and neural machine translation. Character based neural machine translation, not being previously tested for the task of normalization, showed the best results.


