Konferensartikel

Finite-state relations between two historically closely related languages

Kimmo Koskenniemi
University of Helsinki, Finland

Ladda ner artikel

Ingår i: Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18

Linköping Electronic Conference Proceedings 87:4, s. 53-53

NEALT Proceedings Series 18:4, p. 53-53

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-587-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Regular correspondences between historically related languages can be modelled using finitestate transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a protolanguage) is established between two related languages. This representation; AFE (Aligned Finnish-Estonian) is based on the letter by letter alignment of the two languages and uses mechanically constructed morphophonemes which represent the corresponding characters. By describing the constraints of this AFE using two-level rules; one may construct useful mappings between the languages. In this way; the badly ambiguous FSTs from Finnish and Estonian to AFE can be composed into a practically unambiguous transducer from Finnish to Estonian. The inverse mapping from Estonian to Finnish is mildly ambiguous. Steps according to the proposed method could be repeated as such with dialectal or older written texts. Choosing a set of model words; aligning them; recording the mechanical correspondences and designing rules for the constraints could be done with a limited effort. For the purposes of indexing and searching; the mild ambiguity may be tolerable as such. The ambiguity can be further reduced by composing the resulting FST with a speller or morphological analyser of the standard language.

Nyckelord

Finite-State Transducers; Historical Linguistics; HFST; Two-Level Morphology; Foma

Referenser

Allauzen; C.; Riley; M.; Schalkwyk; J.; Skut; W.; and Mohri; M. (2007). Openfst: A general and efficient weighted finite-state transducer library. In Proceedings of the Twelfth International Conference on Implementation and Application of Automata; (CIAA 2007); volume 4783 of Lecture Notes in Computer Science; pages 11–23; Prague; Czech Republic. Springer.

Beesley; K. R. and Karttunen; L. (2003). Finite State Morphology. Studies in Computational Linguistics; 3. University of Chicago Press. Additional info; see: ????st????r?????? ????r????s????????????t??.

Bouchard-Côté; A.; Griffiths; T. L.; and Klein; D. (2009). Improved reconstruction of protolanguage word forms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics; pages 65–73; Boulder; Colorado. Association for Computational Linguistics.

Bouchard-Côté; A.; Hall; D.; Griffiths; T. L.; and Klein; D. (2013). Automated reconstruction of ancient languages using probabilistic models of sound change. Proceedings of the National Academy of Sciences; 10.1073/pnas.1204678110.

Campbell; L. (2004). Historical Linguistics: An Introduction. Edinburgh University Press; second edition.

Hulden; M. (2009). Foma: a finite-state compiler and library. In Proceedings of the Demonstrations Session at EACL 2009; pages 29–32; Athens; Greece. Association for Computational Linguistics.

Karttunen; L. (1993). Finite-state constraints. In Proceedings of the International Conference on Current Issues in Computational Linguistics; June 10–14; 1991. Universiti Sains Malaysia; Penang; Malaysia; pages 173–194.

Kondrak; G. (2002). Algorithms for Language Reconstruction. PhD thesis; University of Toronto.

Koskenniemi; K. (1983). Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. Number 11 in Publications. University of Helsinki; Department of General Linguistics.

Lindén; K.; Axelson; E.; Hardwick; S.; Pirinen; T. A.; and Silfverberg; M. (2011). Hfst – framework for compiling and applying morphologies. In Mahlow; C. and Piotrowski; M.; editors; Systems and Frameworks for Computational Morphology 2011 (SFCM-2011); volume 100 of Communications in Computer and Information Science; pages 67–85.

Pirinen; T. (2011). Modularisation of finnish finite-state language description — towards wide collaboration in open source development of a morphological analyser. In Pedersen; B. S.; Nešpore; G.; and Skadin. a; I.; editors; Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011; NEALT Proceedings Series; Vol. 11 (2011); pages 299–302. Northern European Association for Language Technology (NEALT).

Schmid; H. (2005). A programming language for finite state transducers.

Silfverberg; M. and Lindén; K. (2009). Conflict resolution using weighted rules in hfst-twolc. In Proceedings of the 17th Nordic Conference of Computational Linguistics; NODALIDA 2009; pages 174–181. Northern European Association for Language Technology (NEALT).

Wettig; H.; Reshetnikov; K.; and Yangarber; R. (2012). Using context and phonetic features in models of etymological sound change. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH; pages 108–116; Avignon; France. Association for Computational Linguistics.

Citeringar i Crossref