This paper is about the automatic annotation of head movements in videos of face-to-face conversations. Manual annotation of gestures is resource consuming, and modelling gesture behaviours in different types of communicative settings requires many types of annotated data. Therefore, developing methods for automatic annotation is crucial. We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 73.47% with respect to these. The results also show that using jerk improves accuracy.
Sames Al Moubayed, Malek Baklouti, Mohamed Chetouani, Thierry Dutoit, Ammar Mahdhaoui, J-C Martin, Stanislav Ondas, Catherine Pelachaud, J´erˆome Urbain, and Mehmet Yilmaz. 2009. Generating robot/agent backchannels during a storytelling experiment. In Robotics and Automation, 2009. ICRA’09. IEEE International Conference, pages 3749–3754. IEEE.
Jens Allwood, Loredana Cerrato, Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. 2007. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Jean-Claude Martin, Patrizia Paggio, Peter Kuehnlein, Rainer Stiefelhagen, and Fabio Pianesi, editors, Multimodal Corpora for Modelling Human Multimodal Behaviour, volume 41 of Special issue of the International Journal of Language Resources and Evaluation, pages 273–287. Springer.
Jens Allwood. 1988. The Structure of Dialog. In Martin M. Taylor, Franoise Neél, and Don G. Bouwhuis, editors, Structure of Multimodal Dialog II, pages 3–24. John Benjamins, Amsterdam.
G. Bradski and A. Koehler. 2008. Learning OpenCV: Computer Vision with the OpenCV Linbrary. O’Reilly.
Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23:283–292.
D. Heylen, E. Bevacqua, M. Tellier, and C. Pelachaud. 2007. Searching for prototypical facial feedback signals. In Proceedings of 7th International Conference on Intelligent Virtual Agents, pages 147–153.
Kristiina Jokinen and Graham Wilcock. 2014. Automatic and manual annotations in first encounter dialogues. In Human Language Technologies - The Baltic Perspective: Proceedings of the 6th International Conference Baltic HLT 2014, volume 268 of Frontiers in Artificial Intelligence and Applications, pages 175–178.
Bart Jongejan. 2012. Automatic annotation of head velocity and acceleration in anvil. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 201–208. European Language Resources Distribution Agency.
Ashish Kapoor and Rosalind W. Picard. 2001. A real-time head nod and shake detector. In Proceedings of the 2001 Workshop on Perceptive User Interfaces, PUI ’01, pages 1–5, New York, NY, USA. ACM.
Adam Kendon. 2004. Gesture. Cambridge University Press.
Michael Kipp. 2004. Gesture Generation by Imitation – From Human Behavior to Computer Character Animation. Boca Raton, Florida: Dissertation.com.
Daniel P. Loehr. 2007. Aspects of rhythm in gesture and speech. Gesture, 7(2).
Evelyn McClave. 2000. Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32:855–878.
L.-P. Morency, C. Sidner, C. Lee, and T. Darrell. 2005. Contextual recognition of head gestures. In Proc. Int. Conf. on Multimodal Interfaces (ICMI).
Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In 2007 IEEE conference on computer vision and pattern recognition, pages 1–8. IEEE.
P. Paggio and C. Navarretta. 2011. Head movements, facial expressions and feedback in Danish first encounters interactions: A culture-specific analysis. In Constantine Stephanidis, editor, Universal Access in Human-Computer Interaction - Users Diversity. 6th International Conference. UAHCI 2011, Held as Part of HCI International 2011, number 6766 in LNCS, pages 583–690, Orlando Florida. Springer Verlag.
Patrizia Paggio and Costanza Navarretta. 2016. The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Language Resources and Evaluation, pages 1–32.
Patrizia Paggio, Jens Allwood, Elisabeth Ahls´en, Kristiina Jokinen, and Costanza Navarretta. 2010. The NOMCO multimodal nordic resource - goals and characteristics. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA).
W. Tan and G. Rong. 2003. A real-time head nod and shake detector using HMMs. Expert Systems with Applications, 25(3):461–466.
Haolin Wei, Patricia Scanlon, Yingbo Li, David S Monaghan, and Noel E O’Connor. 2013. Real-time head nod and shake detection for continuous human affect recognition. In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pages 1–4. IEEE.
Victor Yngve. 1970. On getting a word in edgewise. In Papers from the sixth regional meeting of the Chicago Linguistic Society, pages 567–578.
Z. Zhao, Y. Wang, and S. Fu. 2012. Head movement recognition based on the Lucas-Kanade algorithm. In Computer Science Service System (CSSS), 2012 International Conference on, pages 2303–2306, Aug.