MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

Kuan Fang; Fangchen Liu; Pieter Abbeel; Sergey Levine

Robotics: Science and Systems XX

MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

Kuan Fang, Fangchen Liu, Pieter Abbeel, Sergey Levine

Abstract:

Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this paper, we introduce Marking Open-world Keypoint Affordances (MOKA), an approach that employs VLMs to solve robotic manipulation tasks specified by free-form language instructions. Central to our approach is a compact point-based representation of affordance, which bridges the VLM’s predictions on observed images and the robot’s actions in the physical world. By prompting the pre-trained VLM, our approach utilizes the VLM’s commonsense knowledge and concept understanding acquired from broad data sources to predict affordances and generate motions. To facilitate the VLM’s reasoning in zero-shot and few-shot manners, we propose a visual prompting technique that annotates marks on images, converting affordance reasoning into a series of visual question-answering problems that are solvable by the VLM. We further explore methods to enhance performance with robot experiences collected by MOKA through in-context learning and policy distillation. We evaluate and analyze MOKA’s performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.

Download:

Bibtex:

  
@INPROCEEDINGS{Fang-RSS-24, 
    AUTHOR    = {Kuan Fang AND Fangchen Liu AND Pieter Abbeel AND Sergey Levine}, 
    TITLE     = {{MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2024}, 
    ADDRESS   = {Delft, Netherlands}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2024.XX.062} 
}