Conference article

Experiments on sentence segmentation in Old Swedish editions

Gerlof Bouma
Språkbanken, Department of Swedish University of Gothenburg, Sweden

Yvonne Adesam
Språkbanken, Department of Swedish University of Gothenburg, Sweden

Download article

Published in: Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18

Linköping Electronic Conference Proceedings 87:2, p. 11-26

NEALT Proceedings Series 18:2, p. 11-26

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-587-2

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We Present experiments on automatic segmentation of electronic Old Swedish editions into sentence-like units. Our target material is haracterized by a great variation in the type of boundaries that are marked orthographically; the extent of boundary marking; and the means of boundary marking. We begin with an exploration of boundary marking in a large; unannotated corpus of Old Swedish texts. Then we show that we are able to improve upon a simple but effective segmenting baseline; using a conditional random field model trained on a manually annotated corpus. A more valuable lesson the modelling teaches us; however; is that we need to address the boundary marking variation explicitly.

Keywords

Sentence-like units; boundary detection; Old Swedish

References

No references available

Citations in Crossref