Robotics: Science and Systems XX

Any-point Trajectory Modeling for Policy Learning

Chuan Wen, Xingyu Lin, John Ian Reyes So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel

Abstract:

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, \textbf{A}ny-point \textbf{T}rajectory \textbf{M}odeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across the \textbf{130} language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80$\%$ on average. Furthermore, we show effective transfer learning of manipulation skills from human videos.

Download:

Bibtex:

  
@INPROCEEDINGS{Wen-RSS-24, 
    AUTHOR    = {Chuan Wen AND Xingyu Lin AND John Ian Reyes So AND Kai Chen AND Qi Dou AND Yang Gao AND Pieter Abbeel}, 
    TITLE     = {{Any-point Trajectory Modeling for Policy Learning}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2024}, 
    ADDRESS   = {Delft, Netherlands}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2024.XX.092} 
}