An Approach to Measure Pronunciation Similarity in Second Language Learning Using Radial Basis Function Kernel

Christos Koniaris
University of Gothenburg, Centre for Language Technology, Department of Philosophy, Linguistics and Theory of Science, Dialogue Technology Lab, Gothenburg, Sweden

Ladda ner artikel

Ingår i: Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping Electronic Conference Proceedings 107:6, s. 74–86

NEALT Proceedings Series 22:6, p. 74–86

Visa mer +

Publicerad: 2014-11-11

ISBN: 978-91-7519-175-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper shows a method to diagnose potential mispronunciations in second language learning by studying the characteristics of the speech produced by a group of native speakers and the speech produced by various non-native groups of speakers from diverse language backgrounds. The method compares the native auditory perception and the non-native spectral representation on the phoneme level using similarity measures that are based on the radial basis function kernel. A list of ordered problematic phonemes is found for each non-native group of speakers and the results are analyzed based on a relevant linguistic survey found in the literature. The experimental results indicate an agreement with linguistic findings of up to 80.8% for vowels and 80.3% for consonants.


pronunciation error detection; similarity measure; radial basis function kernel; phoneme; second language learning


Bannert, R. (1984). Problems in learning Swedish pronunciation and in understanding foreign accent. Folia Linguistica, 18(1-2):193–222.

Braun, M. L., Buhmann, J. M., and Müller, K.-R. (2008). On relevant dimensions in kernel feature spaces. J. Machine Learn. Research, 9:1875–1908.

Delmonte, R. (2000). SLIM prosodic automatic tools for self-learning instruction. Speech Communication, 30(2-3):145–166.

Flege, J. E. (1995). Second-language speech learning: theory, findings, and problems. Strange, W. (Ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues in Cross-Language Speech Research. Timonium, MD: York Press Inc.

Franco, H., Neumeyer, L., Kim, Y., and Ronen, O. (1997). Automatic pronunciation scoring for language instruction. In IEEE Int. Conf. Acoust., Speech, Sig. Proc., Munich, Germany, pages 1471–1474.

Guion, S. G., Flege, J. E., Ahahane-Yamada, R., and Pruitt, J. C. (2000). An investigation of current models of second language speech perception: the case of japanese adults’ perception of english consonants. J. Acoust. Soc. Am., 107(5):2711–2724.

Koniaris, C. and Engwall, O. (2011). Phoneme level non-native pronunciation analysis by an auditory model-based native assessment scheme. In Interspeech, Florence, Italy, pages 1157–1160.

Koniaris, C., Salvi, G., and Engwall, O. (2013). On mispronunciation analysis of individual foreign speakers using auditory periphery models. Speech Communication, 55(5):691–706.

Neumeyer, L., Franco, H., Digalakis, V., and Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30:83–93.

Neumeyer, L., Franco, H., Weintraub, M., and Price, P. (1996). Automatic text-independent pronunciation scoring of foreign language student speech. In Int. Conf. Spoken Lang. Proc., Philadelphia, PA , USA, pages 1457–1460.

Park, J. G. and Rhee, S. C. (2004). Development of the knowledge-based spoken english evaluation system and its application. In ISCA Interspeech, Jeju Island, South Korea, pages 1681–1684.

Piske, T., Flege, J., and MacKay, I. (2001). Factors affecting degree of foreign accent in an l2: a review. J. Phonetics, 29(2):191–215.

Raux, A. and Kawahara, T. (2002). Automatic intelligibility assessment and diagnosis of critical pronunciation errors for computer-assisted pronunciation learning. In Int. Conf. Spoken Lang. Proc., Denver, CO, USA, pages 737–740.

Sjölander, K. (2003). An HMM-based system for automatic segmentation and alignment of speech. In Fonetik, pages 93–96.

Strik, H., Truong, K., de Wet, F., and Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection. Speech Communication, 51(10):845–852.

Tepperman, J. and Narayanan, S. (2005). Hidden-articulator markov models for pronunciation evaluation. In Proc. ASRU, San Juan, Puerto Rico, pages 174–179.

Tepperman, J. and Narayanan, S. (2008). Using articulatory representations to detect segmental errors in nonnative pronunciation. IEEE Tr. Audio, Speech, Lang. Proc., 16(1):8–22.

Truong, K. P., Neri, A., de Wet, F., Cucchiarini, C., and Strik, H. (2005). Automatic detection of frequent pronunciation errors made by L2-learners. In ISCA Interspeech, Lisbon, Portugal, pages 1345–1348.

van de Par, S., Kohlrausch, A., Charestan, G., and Heusdens, R. (2002). A new psychoacoustical masking model for audio coding applications. In IEEE Int. Conf. on Acoust., Speech, Sig. Proc., Orlando, FL, USA, volume 2, pages 1805–1808.

Wei, S., Hu, G., Hu, Y., and Wang, R.-H. (2009). A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Communication, 51(10):896–905.

Weigelt, L. F., Sadoff, S. J., and Miller, J. D. (1990). Plosive/fricative distinction: the voiceless case. J. Acoust. Soc. Am., 87:2729–2737.

Wik, P. and Hjalmarsson, A. (2009). Embodied conversational agents in computer assisted language learning. Speech Communication, 51(10):1024–1037.

Witt, S. M. and Young, S. (2000). Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication, 30:95–108.

Xu, S., Jiang, J., Chen, Z., and Xu, B. (2009). Automatic pronunciation error detection based on linguistic knowledge and pronunciation space. In IEEE Int. Conf. Acoust. Speech Sig. Proc. (ICASSP), Taipei, Taiwan, pages 4841–4844.

Yamashita, Y., Kato, K., and Nozawa, K. (2005). Automatic scoring for prosodic proficiency of english sentences spoken by japanese based on utterance comparison. IECE Trans. Inform. Systems, E88-D:496–501.

Citeringar i Crossref