Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Visak C V Kumar; Sehoon Ha; C. Karen Liu

Robotics: Science and Systems XVII

Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Visak C V Kumar, Sehoon Ha, C. Karen Liu

Abstract:

Simulation provides a safe and efficient way to generate useful data for learning complex robotic tasks. However; matching simulation and real-world dynamics can be quite challenging; especially for systems that have a large number of unobserved or unmeasurable parameters; which may lie in the robot dynamics itself or in the environment with which the robot interacts. We introduce a novel approach to tackle such a sim-to-real problem by developing policies capable of adapting to new environments; in a zero-shot manner. Key to our approach is an error-aware policy (EAP) that is explicitly made aware of the effect of unobservable factors during training. An EAP takes as input the predicted future state error in the target environment; which is provided by an error-prediction function; simultaneously trained with the EAP. We validate our approach on an assistive walking device trained to help the human user recover from external pushes. We show that a trained EAP for a hip-torque assistive device can be transferred to different human agents with unseen biomechanical characteristics. In addition; we show that our method can be applied to other standard RL control tasks.

Download:

Bibtex:

  
@INPROCEEDINGS{KumarV-RSS-21, 
    AUTHOR    = {Visak C V Kumar AND Sehoon Ha AND C. Karen Liu}, 
    TITLE     = {{Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2021}, 
    ADDRESS   = {Virtual}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2021.XVII.065} 
}