Using sub-word n-gram models for dealing with OOV in large vocabulary speech recognition for Latvian

Askars Salimbajevs
Tilde, Riga, Latvia

Jevgenijs Strigins
Tilde, Riga, Latvia

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:37, s. 281-285

NEALT Proceedings Series 23:37, p. 281-285

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In the Latvian language, one word can have tens or even hundreds of surface forms. This is a serious problem for large vocabulary speech recognition. Inclusion of every form in vocabulary will make it intractable, but, on the other hand, even with a vocabulary of 400K, the out-of-vocabulary (OOV) rate will be very high. In this paper, the authors investigate the possibility of using sub-word vocabularies where words are split into frequent and common parts. The results of our experiment show that this allows to significantly reduce the OOV rate.


Inga nyckelord är tillgängliga


Alumäe, T. (2004). Large Vocabulary Continuous Speech Recognition for Estonian Using Morphemes and Classes. In K. Sojka, Petr and Kopecek, Ivan and Pala (Ed.), Text, Speech and Dialogue. LectureNotes in Computer Science (pp. 245–252). Springer Berlin Heidelberg. doi: 10.1007/978-3-540-30120-2_31

Choueiter, G., Povey, D., Chen, S. F., & Zweig, G. (2006). Morpheme-Based Language Modeling for Arabic Lvcsr. 2006 IEEE International Conference on Acoustics Speech and Signal ProcessingProceedings. doi: 10.1109/ICASSP.2006.1660205

Creutz, M., & Lagus, K. (2005). UnsupervisedMorpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0.
Publications in Computer and Information Science,Report A81, Helsinki University of Technology (pp. 1–27).

El-Desoky Mousa, A., Kuo, H. K. J., Mangu, L.,& Soltau, H. (2013). Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic. In ICASSP, IEEEInternational Conference on Acoustics, Speech andSignal Processing - Proceedings (pp. 8435–8439). doi: 10.1109/ICASSP.2013.6639311

Ircing, P., Krbec, P., Hajic, J., Psutka, J., Khudanpur, S., Jelinek, F., & Byrne, W. (2001). On large vocabulary continuous speech recognition of highly inflectional language czech. In EuropeanConference on Speech Communication andTechnology (EUROSPEECH).

Maucec, M. S., Rotovnik, T., Kacic, Z., & Brest, J. (2009). Using data-driven subword units in language model of highly inflective Slovenian language. International Journal of Pattern Recognition and Artificial Intelligence. doi: 10.1142/S0218001409007119

Oparin, I. (2008). Language models for automatic speech recognition of inflectional languages. University of West Bohemia.

Pinnis, M., Auzina, I., & Goba, K. (2014). Designing the Latvian Speech Recognition Corpus. InProceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC’14) (pp. 1547–1553).

Pinnis, M., & Skadinš, R. (2012). MT Adaptation for Under-Resourced Domains – What Works and What Not. In Human Language Technologies – The Baltic Perspective - Proceedings of the Fifth International Conference Baltic HLT 2012 (Vol. 247, pp. 176–184). Tartu, Estonia, Estonia: IOS Press.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., … Vesely, K. (2011). The Kaldi Speech Recognition Toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society.

Schwartz, R., Nguyen, L., Kubala, F., Chou, G., Zavaliagkos, G., & Makhoul, J. (1994). On Using Written Language Training Data for Spoken Language Modeling. In Proceedings of the Workshop on Human Language Technology (pp. 94–98). Stroudsburg, PA, USA: Association for Computational Linguistics. doi: 10.3115/1075812.1075830

Shin, E., Stüker, S., Kilgour, K., Fügen, C., & Waibel, A. (2013). Maximum Entropy Language Modeling for Russian ASR. In Proceedings of the International Workshop for Spoken Language Translation (IWSLT 2013). Heidelberg.

Siivola, V., Hirsimäki, T., Creutz, M., & Kurimo, M. (2003). Unlimited Vocabulary Speech Recognition Based on Morphs Discovered in an Unsupervised Manner. In European Conference on Speech Communication and Technology (EUROSPEECH) (pp. 2293–2296).

Singh, R., Raj, B., & Stern, R. M. (2002). Automatic generation of subword units for speech recognition systems. IEEE Transactions on Speech and Audio
Processing, 10, 89–99. doi: 10.1109/89.985546

Stolcke, A., Shriberg, E., Bates, R., Ostendorf, M., Hakkani, D., Plache, M., … Lu, Y. (1998). Automatic detection of sentence boundaries and disfluencies based on recognized words. ICSLP.

Virpioja, S., Smit, P., Grönroos, S.-A., & Kurimo, M. (2013). Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline. Aalto UniversityPublication Series SCIENCE + TECHNOLOGY,25/2013.

Yuret, D., & Biçici, E. (2009). Modeling Morphologically Rich Languages Using Split Words and Unstructured Dependencies. In ACLIJCNLP 2009.

Citeringar i Crossref