Corpus-driven conversational agents: tools and resources for multimodal dialogue systems development

Maria Di Maro
Department of Humanities, University of Naples ‘Federico II’, Italy

Ladda ner artikel

Ingår i: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:4, s. 39-45

Visa mer +

Publicerad: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


In this paper, we describe how tools made available through CLARIN can be applied for research purposes in the development of corpus-driven conversational agents. The starting point will be the description of a standard architecture for multimodal dialogue systems. For some of its parts, specific available tools will be briefly described, according to their suitability to mutimodal dialogue systems development.


Multimodal dialogue systems, Tools, Corpora


Tilman Becker, Nate Blaylock, Ciprian Gerstenberger, Ivana Kruijff-Korbayov´a, Andreas Korthauer, Manfred Pinkal, Michael Pitz, Peter Poller, and Jan Schehl. 2006. Natural and intuitive multimodal dialogue for incar applications: The sammie system. Frontiers in Artificial Intelligence and Applications, 141:612.

Paul Boersma and David J. M. Weenink. 2002. Praat, a system for doing phonetics by computer. Glot international, 5.

Antoine Bordes, Y-Lan Boureau, and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683.

Harry Bunt, Jan Alexandersson, Jean Carletta, Jae-Woong Choe, Alex Chengyu Fang, Koiti Hasida, Kiyong Lee, Volha Petukhova, Andrei Popescu-Belis, Laurent Romary, et al. 2010. Towards an iso standard for dialogue act annotation. In Seventh conference on International Language Resources and Evaluation (LREC’10).

Francesco Cutugno, Felice DellOrletta, Isabella Poggi, Renata Savy, and Antonio Sorgente. 2018. The chrome manifesto: integrating multimodal data into cultural heritage resources. Proceedings of the Fifth Italian Conference on Computational Linguistics, CLiC-it 2018.

Maria Di Maro, Marco Valentino, Anna Riccio, and Antonio Origlia. 2017. Graph databases for designing highperformance speech recognition grammars. In IWCS 201712th International Conference on Computational SemanticsShort papers.

Maria Di Maro, Sara Falcone, and Francesco Cutugno. 2018. Prosodic analysis in human-machine interaction. Studi AISV, 1:to appear.

Andrew Hunt and Scott McGlashan. 2004. Speech recognition grammar specification version 1.0. W3C Recommendation, March.

Thomas Kisler, Uwe Reichel, and Florian Schiel. 2017. Multilingual processing of speech via web services.

Computer Speech & Language, 45:326–347.

Spyros Kousidis, Casey Kennington, Timo Baumann, Hendrik Buschmeier, Stefan Kopp, and David Schlangen. 2014. A multimodal in-car dialogue system that tracks the driver’s attention. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 26–33. ACM.

Gustavo L´opez, Luis Quesada, and Luis A Guerrero. 2017. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In International Conference on Applied Human Factors and Ergonomics, pages 241–250. Springer.

Lorenzo Lucignano, Francesco Cutugno, Silvia Rossi, and Alberto Finzi. 2013. A dialogue system for multimodal human-robot interaction. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 197–204. ACM.

Andy L¨ucking, Kirsten Bergmann, Florian Hahn, Stefan Kopp, and Hannes Rieser. 2010. The bielefeld speech and gesture alignment corpus (saga). In LREC 2010 workshop: Multimodal corpora–advances in capturing, coding and analyzing multimodality.

Scott McGlashan, Norman Fraser, Nigel Gilbert, Eric Bilange, Paul Heisterkamp, and Nick Youd. 1992. Dialogue management for telephone information systems. In Proceedings of the third conference on Applied natural language processing, pages 245–246. Association for Computational Linguistics.

Antonio Origlia, Renata Savy, Isabella Poggi, Francesco Cutugno, Iolanda Alfano, Francesca D’Errico, Laura Vincze, and Violetta Cataldo. 2018. An audiovisual corpus of guided tours in cultural sites: Data collection protocols in the chrome project. In Proceedings of the 2018 AVI-CH Workshop on Advanced Visual Interfaces for Cultural Heritage, volume 2091.

Piotr Pezik. 2015. Spokes-a search and exploration service for conversational corpus data. In Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands, pages 99–109. Linköping University Electronic Press.

Alan Ritter, Colin Cherry, and Bill Dolan. 2010. Unsupervised modeling of twitter conversations. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 172–180. Association for Computational Linguistics.

Adriana Roventini, Antonietta Alonge, Nicoletta Calzolari, Bernardo Magnini, and Francesca Bertagna. 2000. Italwordnet: a large semantic database for italian. In LREC.

Harvey Sacks, Emanuel A Schegloff, and Gail Jefferson. 1978. A simplest systematics for the organization of turn taking for conversation. In Studies in the organization of conversational interaction, pages 7–55. Elsevier.

Renata Savy. 2009. Clips: diatopic, diamesic and diaphasic variations of spoken italian. In Proceedings of the Corpus Linguistics Conference 2009 (CL2009),, page 213.

Florian Schiel. 1999. Automatic phonetic transcription of non-prompted speech. Proc. of the ICPhS, pages 607–610.

H Schmid, M Baroni, E Zanchetta, and A Stein. 2007. The enriched treetagger system. In proceedings of the EVALITA 2007 workshop.

Thomas Schmidt and Kai Wörner. 2009. Exmaralda–creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA), 19(4):565–582.

Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In AAAI, volume 16, pages 3776–3784.

Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, and Joelle Pineau. 2018. A survey of available corpora for building data-driven dialogue systems: The journal version. Dialogue & Discourse, 9(1):1–49.

Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.

Miriam Voghera and Francesco Cutugno. 2009. An. ana. s.: aligning text to temporal syntagmatic progression in treebanks. In Proceedings of the 5th Corpus Linguistics Conference, Liverpool, pages 20–23.

Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. Elan: a professional framework for multimodality research. In 5th International Conference on Language Resources and Evaluation (LREC 2006), pages 1556–1559.

Citeringar i Crossref