Konferensartikel

Chunking Historical German

Katrin Ortmann

Ladda ner artikel

Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021.

Linköping Electronic Conference Proceedings 178:19, s. 190-199

Visa mer +

Publicerad: 2021-05-21

ISBN: 978-91-7929-614-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Quantitative studies of historical syntax require large amounts of syntactically annotated data, which are rarely available. The application of NLP methods could reduce manual annotation effort, provided that they achieve sufficient levels of accuracy. The present study investigates the automatic identification of chunks in historical German texts. Because no training data exists for this task, chunks are extracted from modern and historical constituency treebanks and used to train a CRF-based neural sequence labeling tool. The evaluation shows that the neural chunker outperforms an unlexicalized baseline and achieves overall F-scores between 90% and 94% for different historical data sets when POS tags are used as feature. The conducted experiments demonstrate the usefulness of including historical training data while also highlighting the importance of reducing boundary errors to improve annotation precision.

Nyckelord

chunking, German, historical language

Referenser

Inga referenser tillgängliga

Citeringar i Crossref