Robotics: Science and Systems XVII

INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

Hanbo Zhang*, Yunfan Lu*, Cunjun Yu, David Hsu, Xuguang Lan, Nanning Zheng
* These authors contributed equally


This paper presents INVIGORATE; a robot system that interacts with humans through natural language and grasps a specified object in clutter. The objects may occlude; obstruct; or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects; from input language expressions and RGB images; (ii) infer object blocking relationships (OBRs) from the images; and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection; for visual grounding; for question generation; and for OBR detection and grasping. They allow for unrestricted object categories and language expressions; subject to the training datasets. However; errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot’s performance. To overcome these uncertainties; we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning; the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available online:



    AUTHOR    = {Hanbo Zhang AND Yunfan Lu AND Cunjun Yu AND David Hsu AND Xuguang Lan AND Nanning Zheng}, 
    TITLE     = {{INVIGORATE: Interactive Visual Grounding and Grasping in Clutter}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2021}, 
    ADDRESS   = {Virtual}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2021.XVII.020}