Fillers, alone or accompanied by pauses and/or gestures, are quite frequent in all types of spoken communication. They have numerous and non-exclusive functions which are related to interaction management (feedback and turn management) or discourse planning. Fillers are part of the language and thus, to some extent, language dependent. This article presents an analysis of fillers, filled pauses and co-occurring gestures in a Danish multimodal corpus of first encounters. The aims of the study are to determine the most common fillers in the corpus, the gestures co-occurring with them, their functions, and possibly their most prototypical uses. The results of our study indicate that the most common fillers in the data are øh, mm, øhm which all are accompanied by one or more gestures in most of their occurrences. We also found that each filler type has a predominant or prototypical use. Mm often occurs alone as feedback marker and is accompanied by feedback gestures. Øhm has the longest duration and often precedes an utterance or a clausal phrase signaling discourse planning. Its co-speech gestures have also interaction management functions. Finally, øh often precedes a content word, has a shorter duration than øhm and signals lexical retrieval. Interestingly the prototypical uses of the vocal øh and the vocal-nasal øhm are the same as those of the English vocal uh and vocal-nasal um, respectively.
[1] J. Allwood, J. Nivre, and E. Ahls´en, “On the semantics and pragmatics of linguistic feedback,” Journal of Semantics, vol. 9, pp. 1–26, 1992.
[2] H. Maclay and C. E. Osgood, “Hesitation phenomena in spontaneous English speech,” Word, vol. 15, pp. 19–44, 1959.
[3] S. Duncan and D. Fiske, Face-to-face interaction. Hillsdale, NJ: Erlbaum, 1977.
[4] S. R. Rochester, “The significance of pauses in spontaneous speech,” Journal of Psycholinguistic Research, vol. 2, pp. 51–81, 1973.
[5] R. Krauss, Y. Chen, and R. F. Gottesman, “Lexical gestures and lexical access: a process model,” in Language and gesture, D. Mc-Neill, Ed. Cambridge University Press, 2000, pp. 261–283.
[6] N. Christenfeld, S. Schachter, and F. Bilous, “Filled pauses and gestures: It’s not coincidence,” Journal of Psycholinguistic Research, vol. 20, no. 1, pp. 1–10, 1991.
[7] F. Rauscher, R. Krauss, and Y. Chen, “Gesture, speech and lexical access: The role of lexical movements in speech production,” Psychological Science, vol. 7, pp. 226–231, 1996.
[8] A. Esposito, K. E. McCullough, and F. Quek, “Disfluencies in gesture: gestural correlates to filled and unfilled speech pauses,” in Proceedings of IEEE International Workshop on Cues in Communication, Hawai, 2001.
[9] D. McNeill, The Conceptual Basis of Language. Routledge Library Editions: Linguistics, 2014.
[10] E. de Leeuw, “Hesitation Markers in English, German, and Dutch,” Journal of Germanic Linguistics, vol. 19, pp. 85–114, 6 2007.
[11] H. H. Clark and J. E. F. Tree, “Using uh and um in spontaneous speaking,” Cognition, vol. 84, pp. 73–11, 2002.
[12] J. Allwood, “Dialog Coding - Function and Grammar: Gteborg Coding Schemas,” Gothenburg Papers in Theoretical Linguistics, University of Gteborg, Dept of Linguistics, vol. 85, pp. 1–67, 2001.
[13] A. Reynolds and A. Paivio, “Cognitive and emotional determinants of speech,” Canadian Journal of Psychology, vol. 22, pp. 164–175, 1968.
[14] E. Shriberg, “Preliminaries to a theory of speech disfluencies,” Ph.D. dissertation, University of California, Berkeley, 1994.
[15] G. Tottie, “Uh and um in British and American English: Are they words? Evidence from co-occurrence with pauses,” in Linguistic Variation: Confronting Fact and Theory, N. Dion, A. Lapierre, and R. T. Cacoullos, Eds. New York: Routledge, 2014, pp. 38–54.
[16] M. Swerts, “Filled pauses as markers of discourse structure,” Journal of Pragmatics, vol. 30, pp. 485–496, 1998.
[17] S. Fraundorf and D. Watson, “The disfluent discourse: Effects of filled pauses on recall,” Journal of memory and language, vol. 65, no. 2, pp. 161–175, 2011.
[18] J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Becket, B. Douville, S. Prevost, and M. Stone, “Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents,” in Proceedings of the 21st annual conference on Computer graphics and interactive techniques. ACM, 1994, pp. 413–420.
[19] D. Traum and J. Rickel, “Embodied agents for multi-party dialogue in immersive virtual worlds,” in Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2, ser. AAMAS ’02. New York, NY, USA: ACM, 2002, pp. 766–773.
[20] L. Pfeifer and T. Bickmore, “Should Agents Speak Like, um, Humans? The Use of Conversational Fillers by Virtual Agents,” in Intelligent Virtual Agents, ser. Lecture Notes in Computer Science, Z. Ruttkay, M. Kipp, A. Nijholt, and H. Vilhj´almsson, Eds. Springer Berlin Heidelberg, 2009, vol. 5773, pp. 460–466.
[21] C. Navarretta, “Pauses delimiting semantic boundaries,” in Proceedings of the 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2015), IEEE, Ed., Gi¨or, Hungary, October 2015, pp. 533–538.
[22] P. Paggio, E. Ahls´en, J. Allwood, K. Jokinen, and C. Navarretta, “The NOMCO multimodal Nordic resource - goals and characteristics,” in Proceedings of LREC 2010, Malta, May 17-23 2010, pp. 2968–2973.
[23] C. Navarretta, E. Ahls´en, J. Allwood, K. Jokinen, and P. Paggio, “Feedback in Nordic First-Encounters: a Comparative Study,” in Proceedings of LREC 2012, Istanbul Turkey, May 2012, pp. 2494–2499.
[24] J. Allwood, L. Cerrato, K. Jokinen, C. Navarretta, and P. Paggio,
“The MUMIN Coding Scheme for the Annotation of Feedback, Turn Management and Sequencing,” Multimodal Corpora for Modelling Human Multimodal Behaviour. Special Issue of the International Journal of Language Resources and Evaluation, vol. 41, no. 3–4, pp. 273–287, 2007.
[25] P. Paggio and C. Navarretta, “Head Movements, Facial Expressions and Feedback in Danish First Encounters Interactions:
A Culture-Specific Analysis,” in Universal Access in Human-Computer Interaction- Users Diversity. 6th International Conference. UAHCI 2011, Held as Part of HCI International 2011, ser. LNCS, C. Stephanidis, Ed., no. 6766. Orlando Florida: Springer Verlag, 2011, pp. 583–690.
[26] C. Navarretta and P. Paggio, “Classifying Multimodal Turn Management in Danish Dyadic First Encounters,” in Proceedings of the 19th Nordic Conference of Computational Linguistics (Nodalida 2013). Oslo, Norway: NEALT, May 2013, pp. 133–146.
[27] A. Kendon, Gesture - Visible Action as Utterance . New York: Cambridge University Press, 2004.
[28] C. Navarretta and P. Paggio, “Classification of feedback expressions in multimodal data,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Upssala, Sweden, Juli 2010, pp. 318–324.
[29] J. Tøndering, “Prosodiske fraser og syntaktisk struktur i spontan tale,” NyS - Nydanske Sprogstudier, vol. 39, pp. 166–198, 2010.
[30] C. Navarretta, “Annotating and analyzing emotions in a corpus of first encounters,” in Proceedings of the 3rd IEEE International Conference on Cognitive Infocommunications, IEEE, Ed., Kosice, Slovakia, December 2012, pp. 433–438.