Loading Events

« All Events

  • This event has passed.

Fall 2025 GRASP on Robotics: Alan Yuille, Johns Hopkins University, “3D Vision Language Models and Interactive World Models”

October 24, 2025 @ 10:30 am - 11:45 am

This event was in-person ONLY in Wu and Chen Auditorium…

ABSTRACT

Vision Language Models (VLMs) are extremely successful, but their performance degrades when asked questions involving spatial relations and 3D world knowledge. Inspired by Cognitive Science, we develop 3D VLMs which are 3D-aware and 3D-explicit to help us to diagnose their failure nodes. We present two approaches which involve developing datasets with 3D annotations for training the 3D VLMs.  The first works was developed on realistic-synthetic datasets and the 3D VLM is built on a 3D Image Parser. This 3D VLMs significantly outperform conventional VLMs for questions involving 3D/6D (Xingrui Wang et al. CVPR 2025 highlight) and physical reasoning (Xingrui Wang et al., ICLR 2025). This work is extended to complex images taking VLMs as base models and evaluated on a 3D comprehensive reasoning benchmark (W. Ma et al. ICCV 2026). We develop a 3D-VLM which significantly outperforms conventional VLMs  when asked questions requiring 3D knowledge (Wufei Ma et al. CVPR 2025 highlight). We further extend this approach to develop a 3D-VLM which performs even better and is also 3D-explicit (Wufei Ma et al. NeurIPS. 2025). We discuss the bigger picture which involves the need for world models as illustrated by (J. Chen et al. ICLR 2025), analysis by synthesis (T. Zheng et al. NeurIPS 2025), and early detection of cancer using radiology reports (P. Bassi et al. MICCAI 2025).

Presenter

Alan Yuille

Alan Yuille - Learn More

Alan Yuille received his B.A. in mathematics from the University of Cambridge in 1976, and completed his Ph.D. in theoretical physics at Cambridge in 1980, supervised by Stephen Hawking. He came to the US in 1981 to do postdoctoral work in theoretical physics at the Physics Department, University of Texas at Austin, and the Institute for Theoretical Physics, Santa Barbara. He then switched to AI and became a research scientist at the Artificial Intelligence Laboratory at MIT (1982-1986), which was followed by a postdoctoral appointment and then a faculty position in the Division of Applied Sciences at Harvard (1986-1995). From 1995-2002 he worked as a senior scientist at the Smith-Kettlewell Eye Research Institute in San Francisco.  From 2002-2016 he was a full professor  in the Department of Statistics at UCLA with joint appointments in Psychology, Computer Science, and Psychiatry. In 2016 he became a Bloomberg Distinguished Professor in Cognitive Science and Computer Science at Johns Hopkins University. He has won a Marr prize, a Marr prize runner-up, a Helmholtz prize, and is a Fellow of IEEE. He has broad research interests in vision, machine learning, cognitive science, medical image analysis, and neuroscience. He has over 600 peer reviewed publications, over 139,000 citations, and an h-number of 149.

Details

Venue

Wu and Chen Auditorium
3330 Walnut Street
Philadelphia, PA 19104
+ Google Map