Conference article

Towards classification of head movements in audiovisual recordings of read news

Johan Frid
Lund University, Humanities Laboratory, Lund University, Sweden

Gilbert Ambrazaitis
Linguistics and Phonetics, Centre for Languages and Literature, Lund University, Sweden

Malin Svensson-Lundmark
Linguistics and Phonetics, Centre for Languages and Literature, Lund University, Sweden

David House
Department of Speech, Music and Hearing, KTH, Sweden

Download article

Published in: Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Copenhagen, 29-30 September 2016

Linköping Electronic Conference Proceedings 141:2, p. 4-9

Show more +

Published: 2017-09-21

ISBN: 978-91-7685-423-5

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

In this paper we develop a system for detection of word-related head movements in audiovisu-al recordings of read news. Our materials consist of Swedish television news broadcasts and comprise audiovisual recordings of five news readers (two female, three male). The corpus was manually labelled for head movement, applying a simplistic annotation scheme consisting of a binary decision about absence/presence of a movement in relation to a word. We use OpenCV for frontal face detection and based on this we calculate velocity and acceleration features. Then we train a machine learning system to predict absence or presence of head movement and achieve an accuracy of 0.892, which is better than the baseline. The system may thus be helpful for head movement labelling.

Keywords

No keywords available

References

Ambrazaitis, G., Svensson Lundmark, M. & House, D. (2015). Multimodal levels of prominence : a preliminary analysis of head and eyebrow movements in Swedish news broadcasts. In Svensson Lundmark, M., Ambrazaitis, G. & van de Weijer, J. (Eds.) Working Papers in General Linguistics and Phonetics (Proceedings from Fonetik 2015) (pp. 11-16), 55. Centre for Languages and Literature, Lund University.

Boersma, P., Weenink, D. 2014. Praat: doing phonetics by computer [Computer program]. http://www.praat.org/

Bruce, G. 1977. Swedish Word Accents in Sentence Perspective. Travaux de l’institut de linguistique de Lund 12. Malmö: Gleerup.

Bruce, G., B. Granström (1993). Prosodic modelling in Swedish speech synthesis. Speech Communication 13, 63–73.

Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining.

Fleiss, J. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382.

Nystrom, M., & Holmqvist, K. (2010). An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behavior Research Methods, 42, 188-204. doi:10.3758/BRM.42.1.188

Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares proce-dures. Analytical Chemistry, 36, 1627-1639.

Viola, P., & Jones, M. J. (2001) Rapid Object Detection using a Boosted Cascade of Simple Features, Proceed-ings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. Volume: 1, pp.511–518.

Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H. 2006. ELAN: a professional framework for multimodality research. Proc. of LREC 2006, Fifth International Conference on Language Resources and Evaluation. See also: http://tla.mpi.nl/tools/tla-tools/elan/

Zhang, S., Wu, Z., Meng, H., Cai, L. (2007) Head Movement Synthesis based on Semantic and Prosodic Fea-tures for a Chinese Expressive Avatar In: ICASSP 2007, Vol. 4, pp.837-840, 2007.4

Citations in Crossref