Robotics: Science and Systems XX
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Maria Attarian, Maria Bauza Villalonga, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil J Joshi, Ben Jyenis, J Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore, Kenneth Oslund, Dushyant Rao, Allen Z. Ren, Baruch Tabanpour, Quan Vuong, Ayzaan Wahid, Ted Xiao, Ying Xu, Vincent Zhuang, Peng Xu, Erik Frey, Ken Caluwaerts, Tingnan Zhang, Brian Ichter, Jonathan Tompson, Leila Takayama, Vincent Vanhoucke, Izhak Shafran, Maja Mataric, Dorsa Sadigh, Nicolas Heess, Kanishka Rao, Nik Stewart, Jie Tan, Carolina ParadaAbstract:
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model – that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments – improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/
Bibtex:
@INPROCEEDINGS{Liang-RSS-24, AUTHOR = {Jacky Liang AND Fei Xia AND Wenhao Yu AND Andy Zeng AND Maria Attarian AND Maria Bauza Villalonga AND Matthew Bennice AND Alex Bewley AND Adil Dostmohamed AND Chuyuan Fu AND Nimrod Gileadi AND Marissa Giustina AND Keerthana Gopalakrishnan AND Leonard Hasenclever AND Jan Humplik AND Jasmine Hsu AND Nikhil J Joshi AND Ben Jyenis AND J Chase Kew AND Sean Kirmani AND Tsang-Wei Edward Lee AND Kuang-Huei Lee AND Assaf Hurwitz Michaely AND Joss Moore AND Kenneth Oslund AND Dushyant Rao AND Allen Z. Ren AND Baruch Tabanpour AND Quan Vuong AND Ayzaan Wahid AND Ted Xiao AND Ying Xu AND Vincent Zhuang AND Peng Xu AND Erik Frey AND Ken Caluwaerts AND Tingnan Zhang AND Brian Ichter AND Jonathan Tompson AND Leila Takayama AND Vincent Vanhoucke AND Izhak Shafran AND Maja Mataric AND Dorsa Sadigh AND Nicolas Heess AND Kanishka Rao AND Nik Stewart AND Jie Tan AND Carolina Parada}, TITLE = {{Learning to Learn Faster from Human Feedback with Language Model Predictive Control}}, BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2024}, ADDRESS = {Delft, Netherlands}, MONTH = {July}, DOI = {10.15607/RSS.2024.XX.125} }