Andreas Søeborg Kirkedal
Department of International Business Communication, CBS, Frederiksberg, Denmark
Download articlePublished in: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Linköping Electronic Conference Proceedings 85:29, p. 321-330
NEALT Proceedings Series 16:29, p. 321-330
Published: 2013-05-17
ISBN: 978-91-7519-589-6
ISSN: 1650-3686 (print), 1650-3740 (online)
Automatic speech recognition (ASR) relies on three resources: audio; orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech recognition system depends heavily on the dictionary and the transcriptions therein. This paper presents an analysis of phonetic/phonemic features that are salient for current Danish ASR systems. This preliminary study consists of a series of experiments using an ASR system trained on the DK-PAROLE corpus. The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate.
Automatic speech recognition; phonetics; phonology; speech; phonetic transcription
Brøndsted; T. and Madsen; J. (1997). Fonemteori og talegenkendelse. Sprog og multimedier. Aalborg Universitetsforlag.
Fiscus; J. (1998). Sclite scoring package version 1.5. US National Institute of Standard Technology (NIST); URL http://www. itl. nist. gov/iaui/894.01/tools.
Gregersen; F. (2007). The lanchart corpus of spoken danish; report from a corpus in progress. Current Trends in Research on Spoken Language in the Nordic Countries; 2:130–143.
Grønnum; N. (2005). Fonetik og fonologi; 3. udg. Akademisk Forlag; København.
Grønnum; N. (2006). Danpass-a danish phonetically annotated spontaneous speech corpus. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC); Genova; Italy; May.
Henrichsen; P. (2007). The danish parole corpus-a merge of speech and writing. Current Trends in Research on Spoken Language in the Nordic Countries; 2:84–93.
Henrichsen; P. and Kirkedal; A. (2011). Founding a large-vocabulary speech recognizer for danish. In Speech in Action; pages 175–193. International Phonetic Association (1999). Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet. Cambridge University Press.
Ljolje; A. (1994). High accuracy phone recognition using context clustering and quasi-triphonic models. Computer Speech & Language; 8:129–151.
Novotney; S. and Callison-Burch; C. (2010). Cheap; fast and good enough: Automatic speech recognition with non-expert transcription. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics; pages 207–215. Association for Computational Linguistics.
Placeway; P.; Chen; S.; Eskenazi; M.; Jain; U.; Parikh; V.; Raj; B.; Ravishankar; M.; Rosenfeld; R.; Seymore; K.; Siegler; M.; et al. (1997). The 1996 hub-4 sphinx-3 system. In Proc. DARPA Speech recognition workshop; pages 85–89. Citeseer.
Schachtenhaufen; R. (2010). Schwa-assimilation og stavelsesgrænser. NyS; (39):64–92.
Wells; J. et al. (1997). Sampa computer readable phonetic alphabet. Handbook of standards and resources for spoken language systems; 4.