This is a hybrid event with in-person attendance in Levine 307 and virtual attendance via Zoom.
ABSTRACT
Most tasks people wish robots could do (fetching objects across rooms, assisting in the kitchen, tidying) require mobile manipulation, the integration of navigation and manipulation. While robots have made remarkable progress in each skill independently, bringing them together sequentially (navigate>manipulate>navigate...) or simultaneously (coordinating base and arm motion to open a fridge or wipe a table) remains one of the hardest challenges in robotics. The difficulty lies not only in mastering two complex capabilities, but in coupling them safely and efficiently, over long horizons, under uncertainty, and in contact‑rich settings. These conditions often break the assumptions of standard imitation and reinforcement learning, which tend to struggle to generalize, train safely, and anticipate, learn from, and recover from errors in unstructured environments. I’ll present three learning algorithms from my lab designed specifically for mobile manipulation: methods that extract skills from in-the-wild human video (SafeMimic), learn structured action spaces that make RL sample-efficient on real robots (SLAC), and integrate memory mechanisms with foundation models to reason over extended tasks (Bumble). Our latest results demonstrate multi-step single-video imitation, surface-wiping RL on wheeled mobile manipulators trained in real world under one hour, and broad task generalization to novel objects in building-wide scale with improved trial efficiency. I’ll close with an analysis of failures and limitations and a roadmap for scaling: toward robots with the adaptability, safety, and fluency needed to make learning mobile manipulation an easy and reliable part of everyday life.