- This event has passed.
Spring 2026 GRASP SFI: Chuning Zhu, University of Washington, “Toward Scalable Robot Learning via World Models”
April 8 @ 3:00 pm - 4:00 pm
This will be a hybrid event with in-person attendance in Levine 307 and virtual attendance on Zoom.
ABSTRACT
As data-driven approaches become the predominant paradigm for robotics, the burden of scaling robot data becomes increasingly transparent. The standard recipe for data-driven robot learning requires teleoperated expert demonstrations on real robots, which are expensive to scale. In this talk, we propose to use world models as a means to pool a vast amount of data from diverse sources for robot learning. The first part of the talk introduces a method for learning from video data by jointly modeling video and action diffusion processes. By utilizing diffusion noise as masking, we can flexibly incorporate action-free Internet videos into policy training, significantly improving its visual generalization. The second part of the talk explores how world models in semantic space enable robot learning from vision-language data. By casting world modeling as Visual Question Answering (VQA) about the future, we inherit the rich pre-trained knowledge of VLMs and enable versatile planning capabilities. The final part of the talk makes a connection between reasoning and latent world models. Using this principle, we build policies that learns from video data without pixel reconstruction, while enabling adaptive scaling of test-time compute.