Keywords: Digital Humanities; Natural Language Processing; Historical Text; Normalisation; Levenshtein Edit Distance; Compound Splitting; Part-of-Speech Tagging; Underresourced Languages; Less-Resource Languages
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Ågren; M.; Fiebranz; R.; Lindberg; E.; and Lindström; J. (2011). Making verbs count. The research project ’Gender and Work’ and its methodology. Scandinavian Economic History Review; 59(3):271–291. Forthcoming.
Baron; A. and Rayson; P. (2008). Vard2: A tool for dealing with spelling variation in historical corpora. In Postgraduate Conference in Corpus Linguistics; Aston University; Birmingham.
Black; A. W. and Taylor; P. (1997). Festival speech synthesis system: system documentation. Technical report; University of Edinburgh; Centre for Speech Technology Research.
Bollmann; M. (2012). (semi-)Automatic Normalization of Historical Texts using Distance Measures and the norma tool. In Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanitites (ACRH-2).
Bollmann; M.; Petran; F.; and Dipper; S. (2011). Rule-based normalization of historical texts. In Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage; pages 34–42; Hissar; Bulgaria.
Borin; L.; Forsberg; M.; and Lönngren; L. (2008). Saldo 1.0 (svenskt associationslexikon version 2). Språkbanken; University of Gothenburg.
Brants; T. (2000). TnT - a statistical part-of-speech tagger. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP); Seattle; Washington; USA.
Ejerhed; E. and Källgren; G. (1997). Stockholm Umeå Corpus. Version 1.0. Produced by Department of Linguistics; Umeå University and Department of Linguistics; Stockholm University. ISBN 91-7191-348-3.
Halácsy; P.; Kornai; A.; and Oravecz; C. (2007). HunPos - an open source trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; pages 209–212; Prague; Czech Republic.
Jurish; B. (2008). Finding canonical forms for historical German text. In Storrer; A.; Geyken; A.; Siebert; A.; and Würzner; K.-M.; editors; Text Resources and Lexical Knowledge: Selected Papers from the 9th Conference on Natural Language Processing (KONVENS 2008); pages 27–37. Mouton de Gruyter; Berlin.
Jurish; B. (2010). More Than Words: Using Token Context to Improve Canonicalization of Historical German. Journal for Language Technology and Computational Linguistics; 25(1):23– 39.
Kukich; K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR); 24(4):377–439.
Levenshtein; V. (1966). Binary Codes Capable of Correcting Deletions; Insertions and Reversals. Soviet Physics Doklady; 10(8):707–710.
Pettersson; E.; Megyesi; B.; and Nivre; J. (2012). Rule-based normalisation of historical text - a diachronic study. In Proceedings of the First International Workshop on Language Technology for Historical Text(s); Vienna; Austria.
Rayson; P.; Archer; D.; and Nicholas; S. (2005). VARD versus Word – A comparison of the UCREL variant detector and modern spell checkers on English historical corpora. In Proceedings from the Corpus Linguistics Conference Series on-line e-journal; volume 1; Birmingham; UK.
Stymne; S. (2008). German compounds in factored statistical machine translation. InRanta; A. and Nordström; B.; editors; Proceedings of GoTAL; 6th International Conference on Natural Language Processing; volume 5221; pages 464–475; Gothenburg; Sweden. Springer LNCS/LNAI.
Stymne; S. and Holmqvist; M. (2008). Processing of Swedish Compounds for Phrase-Based Statistical Machine Translation. In Proceedings of the 12th EAMT conference; Hamburg; Germany.