Conference article

Chunking Historical German

Katrin Ortmann

Download article

Published in: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021.

Linköping Electronic Conference Proceedings 178:19, p. 190-199

NEALT Proceedings Series 45:19, p. 190-199

Show more +

Published: 2021-05-21

ISBN: 978-91-7929-614-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Quantitative studies of historical syntax require large amounts of syntactically annotated data, which are rarely available. The application of NLP methods could reduce manual annotation effort, provided that they achieve sufficient levels of accuracy. The present study investigates the automatic identification of chunks in historical German texts. Because no training data exists for this task, chunks are extracted from modern and historical constituency treebanks and used to train a CRF-based neural sequence labeling tool. The evaluation shows that the neural chunker outperforms an unlexicalized baseline and achieves overall F-scores between 90% and 94% for different historical data sets when POS tags are used as feature. The conducted experiments demonstrate the usefulness of including historical training data while also highlighting the importance of reducing boundary errors to improve annotation precision.

Keywords

chunking, German, historical language

References

No references available

Citations in Crossref