Robotics: Science and Systems XVII

Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos

Annie S. Chen, Suraj Nair, Chelsea Finn


We are motivated by the goal of generalist robots that can complete a wide range of tasks across many environments. Critical to this is the robot's ability to acquire some metric of task success or reward; which is necessary for reinforcement learning; planning; or knowing when to ask for help. For a general-purpose robot operating in the real world; this reward function must also be able to generalize broadly across environments; tasks; and objects; while depending only on on-board sensor observations (e.g. RGB images). While deep learning on large and diverse datasets has shown promise as a path towards such generalization in computer vision and natural language; collecting high quality datasets of robotic interaction at scale remains an open challenge. In contrast; “in-the-wild” videos of humans (e.g. YouTube) contain an extensive collection of people doing interesting tasks across a diverse range of settings. In this work; we propose a simple approach; Domain-agnostic Video Discriminator (DVD); that learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task; and can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos. We find that by leveraging diverse human datasets; this reward function (a) can generalize zero shot to unseen environments; (b) generalize zero shot to unseen tasks; and (c) can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.



    AUTHOR    = {Annie S. Chen AND Suraj Nair AND Chelsea Finn}, 
    TITLE     = {{Learning Generalizable Robotic Reward Functions from “In-The-Wild” Human Videos}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2021}, 
    ADDRESS   = {Virtual}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2021.XVII.012}