Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

Alexander Herzog; Kanishka Rao; Karol Hausman; Yao Lu; Paul Wohlhart; Mengyuan Yan; Jessica Lin; Montserrat Gonzalez Arenas; Ted Xiao; Daniel Kappler; Daniel Ho; Jarek Rettinghouse; Yevgen Chebotar; Kuang-Huei Lee; Keerthana Gopalakrishnan; Ryan Julian; Adrian Li; Chuyuan Fu; Bob Wei; Sangeetha Ramesh; Khem Holden; Kim Kleiven; David J Rendleman; Sean Kirmani; Jeffrey Bingham; Jonathan Weisz; Ying Xu; Wenlong Lu; Matthew Bennice; Cody Fong; David Do; Jessica Lam; Yunfei Bai; Benjie Holson; Michael Quinlan; Noah Brown; Mrinal Kalakrishnan; Julian Ibarz; Peter Pastor; Sergey Levine

Robotics: Science and Systems XIX

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

Alexander Herzog, Kanishka Rao, Karol Hausman, Yao Lu, Paul Wohlhart, Mengyuan Yan, Jessica Lin, Montserrat Gonzalez Arenas, Ted Xiao, Daniel Kappler, Daniel Ho, Jarek Rettinghouse, Yevgen Chebotar, Kuang-Huei Lee, Keerthana Gopalakrishnan, Ryan Julian, Adrian Li, Chuyuan Fu, Bob Wei, Sangeetha Ramesh, Khem Holden, Kim Kleiven, David J Rendleman, Sean Kirmani, Jeffrey Bingham, Jonathan Weisz, Ying Xu, Wenlong Lu, Matthew Bennice, Cody Fong, David Do, Jessica Lam, Yunfei Bai, Benjie Holson, Michael Quinlan, Noah Brown, Mrinal Kalakrishnan, Julian Ibarz, Peter Pastor, Sergey Levine

Abstract:

We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects.

Download:

Bibtex:

  
@INPROCEEDINGS{Herzog-RSS-23, 
    AUTHOR    = {Alexander Herzog AND Kanishka Rao AND Karol Hausman AND Yao Lu AND Paul Wohlhart AND Mengyuan Yan AND Jessica Lin AND Montserrat Gonzalez Arenas AND Ted Xiao AND Daniel Kappler AND Daniel Ho AND Jarek Rettinghouse AND Yevgen Chebotar AND Kuang-Huei Lee AND Keerthana Gopalakrishnan AND Ryan Julian AND Adrian Li AND Chuyuan Fu AND Bob Wei AND Sangeetha Ramesh AND Khem Holden AND Kim Kleiven AND David J Rendleman AND Sean Kirmani AND Jeffrey Bingham AND Jonathan Weisz AND Ying Xu AND Wenlong Lu AND Matthew Bennice AND Cody Fong AND David Do AND Jessica Lam AND Yunfei Bai AND Benjie Holson AND Michael Quinlan AND Noah Brown AND Mrinal Kalakrishnan AND Julian Ibarz AND Peter Pastor AND Sergey Levine}, 
    TITLE     = {{Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2023}, 
    ADDRESS   = {Daegu, Republic of Korea}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2023.XIX.022} 
}