Top-down Bottom-up Experiments on Detecting Co-speech Gesturing in Conversation

Kristiina Jokinen
Institute of Computer Science, University of Tartu, Estonia

Ladda ner artikel

Ingår i: Proceedings from the 3rd European Symposium on Multimodal Communication, Dublin, September 17-18, 2015

Linköping Electronic Conference Proceedings 105:7, s. 38-44

Visa mer +

Publicerad: 2016-09-16

ISBN: 978-91-7685-679-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Automatic analysis of conversational videos and detection of gesturing and body movement of the partners is one of the areas where technology development has been rapid. This paper deals with the application of video techniques to human communication studies, and focuses on detecting communicative gesturing in conversational videos. The paper sets to investigate the top-down-bottom-up methodology, which aims to combine the two approaches used in interaction studies: the human annotation of the data and the automatic analysis of the data.


Inga nyckelord är tillgängliga


Allwood, J. 2001. Dialog Coding—Function and Grammar. Gothenburg Papers. Theoretical Linguistics, 85. Department of Linguistics, Gothenburg University.

Allwood, J., L. Cerrato, K. Jokinen, C. Navarretta and P. Paggio. 2007. The MUMIN Coding Scheme for the Annotation of Feedback, Turn Management and Sequencing. In Martin, J.C. et al. (eds.), Multimodal Corpora for Modelling Human Multimodal Behaviour. Special issue of the International Journal of Language.

Argyle, M. 1988. Bodily Communication. London: Methuen.

Argyle, M., Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.

Austin, J. L. 1962. How to Do Things with Words. Oxford University Press.

Battersby, S. 2011. Moving Together: the organization of Non-verbal cues during multiparty conversation. PhD Thesis, Queen Mary, University of London.

Campbell, N., Scherer, S. 2010. Comparing Measures of Synchrony and Alignment in Dialogue Speech Timing with respect to Turn-taking Activity. Proceedings of Interspeech. Makuhari, Japan

Canny, J. 1986. A Computational Approach to Edge Detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6), pp. 679-698.

Endrass, B., Rehm M., Andre, E. 2009. Culture-specific communication management for virtual agents. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’09). 281–288.

Gonzales, R. C., Woods, R. E. 2010. Digital Image Processing (3rd edition). Pearson Education, pp. 652-661.

Jokinen, K. 2009. Gestures in Alignment and Conversation Activity. Proceedings of the Conference of the Pacific Association for Computational Linguistics Conference (PACLING), Sapporo, Japan, pp. 141-146.

Jokinen, K. 2009. Constructive Dialogue Modelling:Rational Agents and Speech Interfaces. Chichester: John Wiley.

Jokinen, K., Pelachaud, C. 2013. From Annotation to Multimodal Behavior. In Rojc, M. & Campbell, N. (Eds.) Co-verbal Synchrony in Human-Machine Interaction. CRC Press, Taylor & Francis Group, New York.

Jokinen, K., Tenjes, S. 2012. Investigating Engagement – Intercultural and Technological Aspects of the Collection, Analysis, and Use of Estonian Multiparty Conversational Video Data. Proceedings of LREC’12, pp. 2764 – 2769. Istanbul, Turkey: ELRA.

Kendon, A. 2004. Gesture: Visible action as utterance. New York: Cambridge University Press.

Kita, S. 2000. How representational gestures help speaking. In D. McNeill (Ed.), Language and gesture. pp. 162-185. Cambridge: Cambridge University Press. Open Access accepted version: http://wrap.warwick.ac.uk/66257/

Paggio, P., Allwood, J., Ahlsén, E., Jokinen, K., Navarretta, C. 2010. The NOMCO Multimodal Nordic Resource – Goals and Characteristics. In Proceedings of LREC’10, Valetta, Malta: ELRA.

Schuller, B, Batliner, A. 2013. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, 2013.

Vels, M., Jokinen, K. 2015a. Recognition of Human Body Movements for Studying Engagement in Conversational Video Files. In: Jokinen, K. & Vels, M. (eds) Proceedings of the 2nd European and the 5th Nordic Symposium on Multimodal Communication, August 6-8, 2014. Tartu, Estonia. 110:014. Linkping: LiU Eletronic Press.

Vels, M., Jokinen, K. 2015b. Detecting Body, Head, and Speech in Engagement. Proceedings of the IVA 2015 Workshop on Engagement in Social Intelligent VirtualAgents (ESIVA 2015).

Citeringar i Crossref