Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and Non-Native Speakers of Three Major Languages in Dictation Workflows

Julián Zapata
School of Translation and Interpretation, University of Ottawa, Canada

Andreas Søeborg Kirkedal
Copenhagen Business School & Mirsk Digital ApS, Denmark

Ladda ner artikel

Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:25, s. 201-210

NEALT Proceedings Series :25, p. 201-210

Visa mer +

Publicerad: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper, we report on a two-part experiment aiming to assess and compare the performance of two types of automatic speech recognition (ASR) systems on two different computational platforms when used to augment dictation workflows. The experiment was performed with a sample of speakers of three major languages and with different linguistic profiles: non-native English speakers; non-native French speakers; and native Spanish speakers. The main objective of this experiment is to examine ASR performance in translation dictation (TD) and medical dictation (MD) workflows without manual transcription vs. with transcription. We discuss the advantages and drawbacks of a particular ASR approach in different computational platforms when used by various speakers of a given language, who may have different accents and levels of proficiency in that language, and who may have different levels of competence and experience dictating large volumes of text, and with ASR technology. Lastly, we enumerate several areas for future research.


Inga nyckelord är tillgängliga


Bowker, Lynne. 2002. Computer-Aided Translation Technology: A Practical Introduction. Ottawa: University of Ottawa Press.

Brousseau, Julie, Caroline Drouin, George Foster, Pierre Isabelle, Roland Kuhn, Yves Normandin, and Pierre Plamondon. 1995. “French Speech Recognition in an Automatic Dictation System for Translators: The TransTalk Project.” In Proceedings of Eurospeech’95. http://www.iro.umontreal.ca/~foster/papers/ttalkeurospeech95. pdf.

Carl, Michael. 2012. “Translog - II: A Program for Recording User Activity Data for Empirical Reading and Writing Research.” In Proceedings of the Eight International Conference on Language Resources and Evaluation, 4108–4112.

Ciobanu, Drago?. 2014. “Of Dragons and Speech Recognition Wizards and Apprentices.” Revista Tradumàtica (12): 524–538.

Désilets, Alain, Marta Stojanovic, Jean-François Lapointe, Rick Rose, and Aarthi Reddy. 2008. “Evaluating Productivity Gains of Hybrid ASRMT Systems for Translation Dictation.” In Proceedings of the IWSLT2008. http://www.mtarchive. info/IWSLT-2008-Desilets.pdf.

Dragsted, Barbara, Inge Gorm Hansen, and Henrik Selsøe Sørensen. 2009. “Experts Exposed.” Copenhagen Studies in Language 38: 293–317.

Dragsted, Barbara, Inger M. Mees, and Inge Gorm Hansen. 2011. “Speaking Your Translation: Students’ First Encounter with Speech Recognition Technology.” Translation & Interpreting 3 (1): 10–43. http://www.transint. org/index.php/transint/article/viewFile/115/87.

Dymetman, Marc, Julie Brousseau, George Foster, Pierre Isabelle, Yves Normandin, and Pierre Plamondon. 1994. “Towards an Automatic Dictation System for Translators: The TransTalk Project.” In Fourth European Conference on Speech Communication and Technology, 4. Citeseer. http://arxiv.org/abs/cmp-lg/9409012.

Garcia-Martinez, Mercedes, Karan Singla, Aniruddha Tammewar, Bartolomé Mesa-Lao, Ankita Thakur, M. A. Anusuya, Michael Carl, and Srinivas Bangalore. 2014. “SEECAT: ASR & Eye-Tracking Enabled Computer-Assisted Translation.” In Proceedings of the 17th Annual Conference of the European Association for Machine Translation, 81–88.

Gingold, Kurt. 1978. “The Use of Dictation Equipment in Translation.” In La traduction, une profession. Actes du VIIIe Congrès mondial de la fédération internationale des traducteurs, edited by Paul A. Horguelin, 444–448. Ottawa: Conseil des traducteurs et interprètes du Canada.

Gouadec, Daniel. 2007. Translation as a Profession. Amsterdam: John Benjamins.

auptmann, Alexander G., and Alexander I. Rudnicky. 1990. “A Comparison of Speech and Typed Input.” In Proceedings of the Speech and Natural Language Workshop, 219–224.

Hétu, Marie-Pierre. 2012. “Le travail au dictaphone, une solution ergonomique?” Circuit 116 (summer 2012): 23.

Hornbæk, Kasper. 2006. “Current Practice in Measuring Usability: Challenges to Usability Studies and Research.” International Journal of Human-Computer Studies 64 (2) (February): 79– 102. doi: 10.1016/j.ijhcs.2005.06.002. http://linkinghub.elsevier.com/retrieve/pii/S107158 1905001138.

Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd ed. Upper Saddle River, New Jersey: Pearson Prentice Hall.

Lei, Xin, Andrew Senior, Alexander Gruenstein, and Jeffrey Sorensen. 2013. “Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices.” Interspeech (August): 662–665. http://research.google.com/pubs/archive/41176.pdf.

Leijten, Mariëlle, and Luuk Van Waes. 2013. “Keystroke Logging in Writing Research: Using Inputlog to Analyze and Visualize Writing Processes.” Written Communication 30 (3) (June 29): 358–392. doi: 10.1177/0741088313491692. http://wcx.sagepub.com/cgi/doi/10.1177/07410883 13491692.

Mees, Inger M., Barbara Dragsted, Inge Gorm Hansen, and Arnt Lykke Jakobsen. 2013. “Sound Effects in Translation.” Target 25 (1) (January 1): 140–154. http://openurl.ingenta.com/content/xref?genre=arti cle&issn=0924-1884&volume=25&issue=1&spage=140.

Mesa-Lao, Bartolomé. 2014. “Speech-Enabled Computer-Aided Translation: A Satisfaction Survey with Post-Editor Trainees.” In Workshop on Humans and Computer-Assisted Translation, 99–103. Navarro, Gonzalo. 2001. “A guided tour to approximate string matching”. ACM Computing Surveys, 33(1): 31-88.

Oviatt, Sharon. 2012. “Multimodal Interfaces.” In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, edited by Julie A. Jacko, 3rd ed., 415–429. New York: Lawrence Erlbaum Associates.

Pausch, Randy, and James H. Leatherby. 1991. “An Empirical Study: Adding Voice Input to a Graphical Editor.” Journal of the American Voice Input/Output Society 9 (2): 55–66.

Reddy, Aarthi, and Richard C. Rose. 2010. “Integration of Statistical Models for Dictation of Document Translations in a Machine Aided Human Translation Task.” IEEE Transactions on Audio, Speech and Language Processing 18 (8): 1– 11. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumbe r=05393062.

Rodriguez, Luis, Aarthi Reddy, and Richard Rose. 2012. “Efficient Integration of Translation and Speech Models in Dictation Based Machine Aided Human Translation.” In Proceedings of the IEEE 2012 International Conference on Acoustics, Speech, and Signal Processing, 2:4949–4952.

Romero-Fresco, Pablo. 2011. Subtitling Through Speech Recognition: Respeaking. Manchester: St. Jerome.

Uebel, Luis Felipe, and Philip C. Woodland. 1999. “An Investigation into Vocal Tract Length Normalisation.” In Sixth European Conference on Speech Communication and Technology, 1–4. http://www.iscaspeech. org/archive/eurospeech_1999/e99_2527.html.

Vidal, Enrique, Francisco Casacuberta, Luis Rodríguez, Jorge Civera, and Carlos D. Martínez Hinarejos. 2006. “Computer-Assisted Translation Using Speech Recognition.” IEEE Transactions on Audio, Speech and Language Processing 14 (3). http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumbe r=01621206.

Woodland, Philip C. 2001. “Speaker Adaptation for Continuous Density HMMs: A Review.” In ISCA Tutorial and Research Workshop (ITRW) on Adaptation Methods for Speech Recognition.

Zapata, Julián. 2012. “Traduction dictée interactive : intégrer la reconnaissance vocale à l’enseignement et à la pratique de la traduction professionnelle.” M.A. thesis, University of Ottawa. http://www.ruor.uottawa.ca/en/bitstream/handle/10 393/23227/Zapata Rojas_Julian_2012_these.pdf?sequence=1.

Zapata, Julián, 2014. “Exploring Multimodality for Translator-Computer Interaction.” In Proceedings of the 16th International Conference on Multimodal Interaction, 339–343. http://dl.acm.org/citation.cfm?id=2666280.

Citeringar i Crossref