*This seminar will be held in-person in Raisler Lounge as well as virtually via Zoom.
Computer vision has transformed from simple edge detection in the 1980s to modern generative models that generate uncannily realistic images: objects are in sensible places, lighting seems realistic, and textures appear accurate. But how do they achieve this understanding of our visual world?
Probing their internal representations reveals that these models encode fundamental aspects of physical reality. Within these models, we discovered classical computer vision concepts like intrinsic images — decomposing scenes into color, shape, and lighting — learned without explicit training. These discoveries allow us to manipulate real photographs in physically plausible ways. However, we also find surprising gaps in their understanding, such as their limitation of replicating principles of projective geometry, which provides reliable signatures for detecting generated images.
This talk explores what knowledge emerges within generative image models, revealing their strengths and weaknesses. I will discuss how these insights drive new applications and open challenges, pushing us closer to building generative models grounded in the physical world.