Konferensartikel

Automatic word stress annotation of Russian unrestricted text

Robert Reynolds
HSL Faculty, UiT The Arctic University of Norway, Tromsø, Norway

Francis Tyers
HSL Faculty, UiT The Arctic University of Norway, Tromsø, Norway

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:22, s. 173-180

NEALT Proceedings Series 23:22, p. 173-180

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We evaluate the effectiveness of finite-state tools we developed for automatically annotating word stress in Russian unrestricted text. This task is relevant for computer-assisted language learning and text-to-speech. To our knowledge, this is the first study to empirically evaluate the results of this task. Given an adequate lexicon with specified stress, the primary obstacle for correct stress placement is disambiguating homographic wordforms. The baseline performance of this task is 90.07%, (known words only, no morphosyntactic disambiguation). Using a constraint grammar to disambiguate homographs, we achieve 93.21% accuracy with minimal errors. For applications with a higher threshold for errors, we achieved 96.15% accuracy by incorporating frequency-based guessing and a simple algorithm for guessing the stress position on unknown words. These results highlight the need for morphosyntactic disambiguation in the word stress placement task for Russian, and set a standard for future research on this task.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Kenneth R. Beesley and Lauri Karttunen. 2003. Finite State Morphology: Xerox tools and techniques. CSLI Publications, Stanford. Kenneth Church. 1985. Stress assignment in letter to sound rules for speech synthesis. Association for Computational Linguistics, pages 246–253.

Katherine Crosswhite, John Alderete, Tim Beasley, and Vita Markman. 2003. Morphological effects on default stress in novel Russian words. In WCCFL 22 Proceedings, pages 151–164.

Qing Dou, Shane Bergsma, Sittichai Jiampojamarn, and Grzegorz Kondrak. 2009. A ranking approach to stress prediction for letter-to-phoneme conversion. In Proceedings of the Joint Conference of the 47th annual meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 118–126, Suntec, Singapore. Association for Computational Linguistics.

Kieth Hall and Richard Sproat. 2013. Russian stress prediction using maximum entropy ranking. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 879–883, Seattle, Washington, USA. Association for Computational Linguistics.

Fred Karlsson, Atro Voutilainen, Juha Heikkilä, and Arto Anttila, editors. 1995. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Number 4 in Natural Language Processing. Mouton de Gruyter, Berlin and New York.

Fred Karlsson. 1990. Constraint grammar as a framework for parsing running text. In Proceedings of the 13th Conference on Computational Linguistics (COLING), Volume 3, pages 168–173, Helsinki, Finland. Association for Computational Linguistics.

Kimmo Koskenniemi. 1983. Two-level morphology: A general computational model for word-form recognition and production. Technical report, University of Helsinki, Department of General Linguistics.

Kimmo Koskenniemi. 1984. A general computational model for word-form recognition and production. In Proceedings of the 10th International Conference on Computational Linguistics, COLING ’84, pages 178–181, Stroudsburg, PA, USA. Association for Computational Linguistics.

Olga F. Krivnova. 1998. Avtomaticeskij sintez russkoj reci po proizvol’nomu tekstu (vtoraja versija s ženskim golosom) [Automatic Russian speech synthesis with unrestricted text (version 2 with female voice)]. In Trudy meždunarodnogo seminara Dialog [Proceedings of the international seminar Dialog], pages 498–511.

Yulia Lavitskaya and Bari¸s Kabak. 2014. Phonological default in the lexical stress system of Russian: Evidence from noun declension. Lingua, 150:363–385, Oct.

Krister Linden, Miikka Silfverberg, Erik Axelson, Sam Hardwick, and Tommi Pirinen. 2011. Hfst-framework for compiling and applying morphologies. In Cerstin Mahlow and Michael Pietrowski, editors, Systems and Frameworks for Computational Morphology, volume Vol. 100 of Communications in Computer and Information Science, pages 67–85. Springer.

Igor Nožov. 2003. Morfologiceskaja i sintaksiceskaja obrabotka teksta (modeli i programmy) [Morphological and Syntactic Text Processing (models and programs)] also published as Realizacija avtomati?ceskoj sintaksiceskoj segmentacii russkogo predloženija [Realization of automatic syntactic segmentation of the Russian sentence]. Ph.D. thesis, Russian State University for the Humanities, Moscow.

Steve Pearson, Roland Kuhn, Steven Fincke, and Nick Kibre. 2000. Automatic methods for lexical stress assignment and syllabification. In International Conference on Spoken Language Processing, pages 423–426.

Ilya Segalovich. 2003. A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In International Conference on Machine Learning; Models, Technologies and Applications, pages 273–280.

Sowmya Vajjala and Detmar Meurers. 2012. On improving the accuracy of readability classification using insights from second language acquisition. In Joel Tetreault, Jill Burstein, and Claudial Leacock, editors, In Proceedings of the 7thWorkshop on Innovative Use of NLP for Building Educational Applications, pages 163—-173, Montréal, Canada, June.

Association for Computational Linguistics. Gabriel Webster. 2004. Improving letterto-pronunciation accuracy with automatic morphologically-based stress prediction. In Eighth International Conference on Spoken Language Processing, pages 2573–2576.

Briony Williams. 1987. Word stress assignment in a text-to-speech synthesis system for british english. Computer Speech and Language, 2:235–272.

Olga Xomicevic, Sergej Rybin, Andrej Talanov, and Ilya Oparin. 2008. Avtomaticeskoe opredelenie mesta udarenie v neznakomyx slovax v sisteme sinteza reci [Automatic determination of the place of stress in unknown words in a speech synthesis system]. In Materialy XXXVI meždunarodnoj filologiceskoj konferencii [Proceedings of the XXXVI International Philological Conference], Saint Petersburg.

Andrej Anatoljevic Zaliznjak. 1977. Grammaticeskij slovar’ russkogo jazyka: slovoizmenenie: okolo 100 000 slov [Grammatical dictionary of the Russian language: Inflection: approx 100 000 words]. Russkij jazyk.

Citeringar i Crossref