Conference article

Aligning phonemes using finte-state methods

Kimmo Koskenniemi
University of Helsinki, Helsinki, Finland

Download article

Published in: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:7, p. 56-64

NEALT Proceedings Series 29:7, p. 56-64

Show more +

Published: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (print), 1650-3740 (online)


The paper presents two finite-state methods which can be used for aligning pairs of cognate words or sets of different allomorphs of stems. Both methods use weighted finite-state machines for choosing the best alternative. Individual letter or phoneme correspondences can be weighted according to various principles, e.g. using distinctive features. The comparison of just two forms at a time is simple, so that method is easier to refine to include context conditions. Both methods are language independent and could be tuned for and applied to several types of languages for producing gold standard data. The algorithms were implemented using the HFST finite-state library from short Python programs. The paper demonstrates that the solving of some non-trivial problems has become easier and accessible for a wider range of scholars.


No keywords available


Kenneth R. Beesley and Lauri Karttunen. 2003. Finite State Morphology. Studies in Computational Linguistics, 3. University of Chicago Press. Additional info, see:

Alexandre Bouchard-Côté, Thomas L. Griffiths, and Dan Klein. 2009. Improved reconstruction of protolanguage word forms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 65–73, Boulder, Colorado, June. Association for Computational Linguistics.

Alina Maria Ciobanu and Liviu P. Dinu. 2014. Automatic detection of cognates using orthographic alignment. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 99–105, Baltimore, Maryland, June. Association for Computational Linguistics.

Michael A. Covington. 1998. Alignment of multiple languages for historical comparison. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1, pages 275–279, Montreal, Quebec, Canada, August. Association for Computational Linguistics.

Mans Hulden. 2009. Foma: a finite-state compiler and library. In Proceedings of the Demonstrations Session at EACL 2009, pages 29–32, Stroudsburg, PA, USA, April. Association for Computational Linguistics.

Sittichai Jiampojamarn, Grzegorz Kondrak, and Tarek Sherif. 2007. Applying many-to-many alignments and hidden markov models to letter-to-phoneme conversion. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 372–379, Stroudsburg, PA, USA, April. Association for Computational Linguistics.

Grzegorz Kondrak. 2000. A new algorithm for the alignment of phonetic sequences. In 1st Meeting of the North American Chapter of the Association for Computational Linguistics, Proceedings. Association for Computational Linguistics.

Kimmo Koskenniemi. 2013a. Finite-state relations between two historically closely related languages. In Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway, number 87 in NEALT Proceedings Series 18, pages 53–53. Linköping University Electronic Press; Linköpings universitet.

Kimmo Koskenniemi. 2013b. An informal discovery procedure for two-level rules. Journal of Language Modelling, 1(1):155–188.

Krister Lindén, Erik Axelson, Sam Hardwick, Tommi A. Pirinen, and Miikka Silfverberg. 2011. Hfst – framework for compiling and applying morphologies. In Cerstin Mahlow and Michael Piotrowski, editors, Systems and Frameworks for Computational Morphology 2011 (SFCM-2011), volume 100 of Communications in Computer and Information Science, pages 67–85. Springer-Verlag.

Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. 2002. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16(1):69–88.

Mehryar Mohri. 2009. Weighted automata algorithms.In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata. Springer.

John Nerbonne and Wilbert Heeringa. 1997. Measuring dialect distance phonetically. In Computational Phonology: Third Meeting of the ACL Special Interest Group in Computational Phonology, pages 11–18. SigPHON, Association for Computational Linguistics.

Helmut Schmid. 2005. A programming language for finite state transducers. In Proceedings of the 5th International Workshop on Finite State Methods in Natural Language Processing (FSMNLP 2005), Helsinki, Finland.

Harold L. Somers. 1999. Aligning phonetic segments for children’s articulation assessment. Computational Linguistics, 25(2):267–275, June.

Kristina Toutanova and Robert Moore. 2002. Pronunciation modeling for improved spelling correction. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 144–151, Philadelphia, Pennsylvania, USA, July. Association for Computational Linguistics.

Citations in Crossref