Robotics: Science and Systems IX

Toward Interactive Grounded Language Acqusition

Thomas Kollar, Jayant Krishnamurthy, Grant Strimel


This paper addresses the problem of enabling robots to interactively learn visual and spatial models from multi-modal interactions involving speech, gesture and images. Our approach, called Logical Semantics with Perception (LSP), provides a natural and intuitive interface by significantly reducing the amount of supervision that a human is required to provide. This paper demonstrates LSP in an interactive setting. Given speech and gesture input, LSP is able to learn object and relation classifiers for objects like mugs and relations like left and right. We extend LSP to generate complex natural language descriptions of selected objects using adjectives, nouns and relations, such as “the orange mug to the right of the green book.” Furthermore, we extend LSP to incorporate determiners (e.g., “the”) into its training procedure, enabling the model to generate acceptable relational language 20% more often than the unaugmented model.



    AUTHOR    = {Thomas Kollar AND Jayant Krishnamurthy AND Grant Strimel}, 
    TITLE     = {Toward Interactive Grounded Language Acqusition}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2013}, 
    ADDRESS   = {Berlin, Germany}, 
    MONTH     = {June},
    DOI       = {10.15607/RSS.2013.IX.005}