Konferensartikel

Identification of Emphasised Regions in Audio-Visual Presentations

Keith Curtis
ADAPT Centre, School of Computing, Dublin City University, Ireland

Gareth J.F. Jones
ADAPT Centre, School of Computing, Dublin City University, Ireland

Nick Campbell
ADAPT Centre, School of Computer, Science & Statistics, Trinity College Dublin, Ireland

Ladda ner artikel

Ingår i: Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Copenhagen, 29-30 September 2016

Linköping Electronic Conference Proceedings 141:6, s. 37-42

Visa mer +

Publicerad: 2017-09-21

ISBN: 978-91-7685-423-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Rapidly expanding archives of audio-visual recordings available online are making unprecedented amounts of information available in many applications. New and efficient techniques to access this information are needed to fully realise the potential of these archives. We investigate the identification of areas of intentional or unintentional emphasis during audio-visual presentations and lectures. We find that, unlike in audio-only recordings where emphasis can be located using pitch information alone, perceived emphasis can be very much associated with information from the visual stream such as gesticulation. We also investigate potential correlations between emphasised speech, and increased levels of audience engagement during audio-visual presentations and lectures.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Barry Arons. 1994. Pitch-based emphasis detection for segmenting speech recordings. In International Conference on Spoken Langauge Processing.

Gary Bradski and Adrian Kaehler. 2008. Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, Inc.

Francine R Chen and Margaret Withgott. 1992. The use of emphasis to automatically summarize a spoken discourse. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, volume 1, pages 229–232. IEEE.

Keith Curtis, Gareth JF Jones, and Nick Campbell. 2015. Effects of good speaking techniques on audience engagement. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 35–42. ACM.

Keith Curtis, Gareth JF Jones, and Nick Campbell. 2017. Utilising high-level features in summarisation of academic presentations. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 315–321. ACM.

Liwei He, Elizabeth Sanocki, Anoop Gupta, and Jonathan Grudin. 1999. Auto-summarization of audio-video presentations. In Proceedings of the Seventh ACM International Conference on Multimedia (Part 1), pages 489–498. ACM.

Lyndon S Kennedy and Daniel PW Ellis. 2003. Pitch-based emphasis detection for characterization of meeting recordings. In Automatic Speech Recognition and Understanding, 2003. ASRU’03. 2003 IEEE Workshop on, pages 243–248. IEEE.

Bruce D Lucas, Takeo Kanade, et al. 1981. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, volume 81, pages 674–679.

Andrew Rosenberg. 2010. Autobi-a tool for automatic tobi annotation. In INTERSPEECH, pages 146–149.

Paul Viola and Michael J Jones. 2004. Robust real-time face detection. International Journal of Computer Vision, 57(2):137–154.

Citeringar i Crossref