Human-human, human-machine communication: on the HuComTech multimodal corpus

Laszlo Hunyadi
Department of General and Applied Linguistics, University of Debrecen, Debrecen, Hungary

Tamás Várad
MTA Institute of Linguistics, Research Group on Language Technology, Budapest, Hungary

György Kovács
MTA SzTE Reserach Group on Artificial Ingelligence, Szeged, Hungary / Embedded Internet Systems Lag, Luleå Univeristy of Technology, Luleå, Sweden

István Szekrényes
Institute of Philosophy, University of Debrecen, Hungary

Hermina Kiss
Department of General and Applied Linguistics, University of Debrecen, Debrecen, Hungary

Karolina Takács
Department of Phonetics Eötvös Loránd University, Budapest, Hungary

Ladda ner artikel

Ingår i: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:6, s. 56-65

Visa mer +

Publicerad: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The present paper describes HuComTech, a multimodal corpus featuring over 50 hours of video taped interviews with 112 informants. The interviews were carried out in a lab equipped with multiple cameras and microphones able to record posture, hand gestures, facial expressions, gaze etc. as well as the acoustic and linguistic features of what was said. As a result of large-scale manual and semi-automatic annotation, the HuComTech corpus offers a rich dataset on 47 annotation levels. The paper presents the objectives, the workflow, the annotation work, focusing on two aspects in particular i.e. time alignment made with the Leipzig tool WEBMaus and the automatic detection of intonation contours developed by the HuComTech team. Early exploitation of the corpus included analysis of hidden patterns with the use of sophisticated multivariate analysis of temporal relations within the data points. The HuComTech corpus is one of the flagship language resources available through the HunCLARIN repository.


Multimodality, Multimodal corpus, Hidden patterns of communication, Human-machine communication


[Boersma &Weenink, 2016] Boersma, D., Paul & Weenink. 2016. Praat : doing phonetics by computer [computer program]. version 6.0.22. http://www.praat.org/. (retrieved 15 November 2016)

[Beck & Russel, 2006] Berck, P. and Russel, A. 2006. ANNEX – a web-based Framework for Exploiting Annotated Media Resources. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006). Genoa: European Language Resources Association, 2006.

[Cho et al., 2014] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014

[Glorot et al] Glorot, X., Bordes, A., Bengio, Y. 2011. Deep Sparse Rectifier Neural Networks. In: Gordon, G. J., Dunson, D., B. Dudík, M. (eds): AISTATS JMLR Proceedings 15. JMLR.org. 315-323.

[Hochreiter and Schmidhuber, 1997] S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735-1780.

[Holz and Teresniak, 2010] Holz and Teresnai, 2010: F. Holz and S. Teresniak, “Towards automatic detection and tracking of topic change”, in Proc. CICLing, 2010, pp. 327-339

[Hunyadi 2017] Hunyadi, L. 2017. A multimodális kommunikáció grammatikája felé: szekvenciális események rekurzív hierarchikus struktúrája. In: Bánréti, Z. (ed.) Általános Nyelvészeti Tanulmányok XXIX (2017), pp. 155-182.

[Kovacs et al] Kovács, G., Grósz, T., Váradi, T. 2016. Topical unit classification using deep neural nets and probabilistic sampling. In: Proc. CogInfoCom, (pp. 199–204)

[Kovács et al. 2017] Kovács, Gy., Váradi, T. 2017. A különbözo modalitások hozzájárulásának vizsgálata a témairányítás eseteinek osztályozásához a HuComTech korpuszon, in: Vincze, Veronika (ed.) XIII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY2017) Szeged, Szegedi Tudományegyetem Informatikai Tanszékcsoport, (2017) pp. 193-204. , 12 p.

[Magnusson, 2000] Magnusson, M. S. 2000. Discovering hidden time patterns in behavior: Tpatterns and their detection behaviour research methods. Behavior Research Methods, Instruments, & Computers, 32:93–110.

[Mertens, 2004] Mertens, P. 2004. The prosogram : Semi-automatic transcription of prosody based on a tonal perception model. In Proceedings of speech prosody.

[Pápay et al, 2011] Pápay, K., Szeghalmy, S., and Szekrényes, I. 2011. Hucomtech multimodal corpus annotation. Argumentum 7:330–347.

[Rosenberg, 2012] A. Rosenberg, “Classifying skewed data: Importance weighting to optimize average recall” in Proc. Interspeech, 2012, pp. 2242-2245

[Shriberg et al., 2000] E. Shriberg, A. Stolcke. D. Hakkani-Tür, G. Tür, “Prosody-based automatic segmentation of speech into sentences and topics”, Speech Commun. Vol 32, no. 1-2, pp 127-154, 2000

[Szekrényes 2014] Szekrényes, I. 2014. Annotation and interpretation of prosodic data in the hucomtech corpus for multimodal user interfaces. Journal on Multimodal User Interfaces 8:(2):143–150.

[Tóth and Kocsor, 2005] L. Tóth and A. Kocsor, “Training HMM/ANN” hybrid speech recognizers by probabilistic sampling Wittenburg et al 2006] Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. 2006. Elan : a professional framework for multimodality research. In Proceedings of LREC 2006 (pp. 213–269)

[Zellers and Post, 2009] M. Zellers, B. Post, “Fundamental frequency and other prosodic cues to topic structure”, in Workshop on the Discourse-Prosody Interface, 2009. Pp. 377-386

[Zsibrita et al] Zsibrita, János; Vincze, Veronika; Farkas, Richárd 2013: magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of RANLP 2013, pp. 763-771.

Citeringar i Crossref