KILLE: a Framework for Situated Agents for Learning Language Through Interaction

Simon Dobnik
University of Gothenburg, Sweden

Erik Wouter de Graaf
University of Gothenburg, Sweden

Ladda ner artikel

Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:19, s. 162-171

NEALT Proceedings Series 29:19, s. 162-171

Visa mer +

Publicerad: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


We present KILLE, a framework for situated agents for learning language through interaction with its environment (perception) and with a human tutor (dialogue). We provide proof-of-concept evaluations of the usability of the system in two domains: learning of object categories and learning of spatial relations.


Inga nyckelord är tillgängliga


Gary Bradski and Adrian Kaehler. 2008. Learning OpenCV: Computer vision with the OpenCV library. ” O’Reilly Media, Inc.”.

Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. Journal of Artificial Intelligence Research (JAIR), 49(1-47).

Herbert H. Clark. 1996. Using language. Cambridge University Press, Cambridge.

M. W. M. G Dissanayake, P. M. Newman, H. F. Durrant-Whyte, S. Clark, and M. Csorba. 2001. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Transactions on Robotic and Automation, 17(3):229–241.

Simon Dobnik, Robin Cooper, and Staffan Larsson. 2013. Modelling language, action, and perception in Type Theory with Records. In Denys Duchier and Yannick Parmentier, editors, Constraint Solving and Language Processing: 7th International Workshop, CSLP 2012, Orléans, France, September 13–14, 2012, Revised Selected Papers, volume 8114 of Lecture Notes in Computer Science, pages 70–91. Springer Berlin Heidelberg.

Simon Dobnik, Christine Howes, and John D. Kelleher. 2015. Changing perspective: Local alignment of reference frames in dialogue. In Christine Howes and Staffan Larsson, editors, Proceedings of goDIAL - Semdial 2015: The 19th Workshop on the Semantics and Pragmatics of Dialogue, pages 24–32, Gothenburg, Sweden, 24–26th August.

Simon Dobnik. 2009. Teaching mobile robots to use spatial words. Ph.D. thesis, University of Oxford: Faculty of Linguistics, Philology and Phonetics and The Queen’s College, Oxford, United Kingdom, September 4.

Christiane Fellbaum. 1998. WordNet: an electronic lexical database. MIT Press, Cambridge, Mass.

Raquel Fernández, Staffan Larsson, Robin Cooper, Jonathan Ginzburg, and David Schlangen. 2011. Reciprocal learning via dialogue interaction: Challenges and prospects. In Proceedings of the IJCAI 2011 Workshop on Agents Learning Interactively from Human Teachers (ALIHT), Barcelona, Catalonia, Spain.

Stevan Harnad. 1990. The symbol grounding problem. Physica D, 42(1–3):335–346, June.
J.D. Kelleher, F. Costello, and J. van Genabith. 2005. Dynamically structuring updating and interrelating representations of visual and linguistic discourse. Artificial Intelligence, 167:62–102.

Casey Kennington and David Schlangen. 2015. Simple learning and compositional application of perceptually grounded word meanings for incremental reference resolution. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 292–301, Beijing, China, July. Association for Computational Linguistics.

Casey Kennington, Spyros Kousidis, and David Schlangen. 2014. Inprotks: A toolkit for incremental situated processing. Proceedings of SIGdial 2014: Short Papers.

Jacqueline C Kowtko, Stephen D Isard, and Gwyneth M Doherty. 1992. Conversational games within dialogue. HCRC research paper RP-31, University of Edinburgh.

Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I. Christensen. 2007. Situated dialogue and spatial organization: what, where... and why? International Journal of Advanced Robotic Systems, 4(1):125–138. Special issue on human and robot interactive communication.

Staffan Larsson. 2013. Formal semantics for perceptual classification. Journal of Logic and Computation, online:1–35, December 18.

Pierre Lison. 2013. Structured Probabilistic Modelling for Dialogue Management. Ph.D. thesis, Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, 30th October.

Gordon D. Logan and Daniel D. Sadler. 1996. A computational analysis of the apprehension of spatial relations. In Paul Bloom, Mary A. Peterson, Lynn Nadel, and Merrill F. Garrett, editors, Language and Space, pages 493–530. MIT Press, Cambridge, MA.

David G Lowe. 1999. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 1150–1157. IEEE.

David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110.

Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox. 2012. A joint model of language and perception for grounded attribute learning. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML 2012), Edinburgh, Scotland, June 27th - July 3rd.

Brian McMahan and Matthew Stone. 2015. A bayesian model of grounded color semantics. Transactions of the Association for Computational Linguistics, 3:103–115.

Marius Muja and David G Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2(331–340):2.

Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5.

Deb K. Roy. 2002. Learning visually-grounded words and syntax for a scene description task. Computer speech and language, 16(3):353–385.

Deb Roy. 2005. Semiotic schemas: a framework for grounding language in action and perception. Artificial Intelligence, 167(1-2):170–205, September.

David Schlangen and Gabriel Skantze. 2011. A general, abstract model of incremental dialogue processing. Dialogue and discourse, 2(1):83–111.

Michael F. Schober. 1995. Speakers, addressees, and frames of reference: Whose effort is minimized in conversations about locations? Discourse Processes, 20(2):219–247.

Gabriel Skantze and Samer Al Moubayed. 2012. IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. In Proceedings of the 14th ACM international conference on Multimodal interaction, pages 69–76. ACM.

Gabriel Skantze, Anna Hjalmarsson, and Catharine Oertel. 2014. Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Communication, 65:50–66.

Danijel Skocaj, Miroslav Jani?cek, Matej Kristan, Geert-Jan M. Kruijff, Aleš Leonardis, Pierre Lison, Alen Vre?cko, and Michael Zillich. 2010. A basic cognitive system for interactive continuous learning of visual concepts. In ICRA 2010 workshop ICAIR  - Interactive Communication for Autonomous Intelligent Robots, pages 30–36, Anchorage, AK, USA.

Danijel Skocaj, Matej Kristan, Alen Vrecko, Marko Mahnic, Miroslav Janícek, Geert-Jan M. Kruijff, Marc Hanheide, Nick Hawes, Thomas Keller, Michael Zillich, and Kai Zhou. 2011. A system for interactive learning in dialogue with a tutor. In IEEE/RSJ International Conference on Intelligent Robots and Systems IROS 2011, San Francisco, CA, USA, 25-30 September.

Joshua B. Tenenbaum, Charles Kemp, Thomas L. Griffiths, and Noah D. Goodman. 2011. How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022):1279–1285.

Terry Winograd. 1976. Understanding Natural Language. Edinburgh University Press.

Citeringar i Crossref