Fall 2025 GRASP Seminar: Congyue Deng, Massachusetts Institute of Technology, “Geometric Deep Representations for Visual Understanding and Beyond”
December 10 @ 11:00 am - 12:00 pm
This event will be in-person in AGH 306.
ABSTRACT
Deep learning frameworks, both supervised and unsupervised, have achieved remarkable success across a wide range of 2D and 3D visual understanding tasks. However, while these models excel at capturing semantic aspects of visual data, they often struggle to represent or reason about geometric relationships within their high-dimensional latent spaces. For example, point cloud networks trained on well-aligned datasets like ShapeNet frequently fail when evaluated on objects in arbitrary poses. Such limitations are not isolated incidents; they reflect broader challenges in current learning paradigms, particularly when robustness, generalizability, and trustworthiness are essential in real-world applications.
In this talk, I will address these challenges from the perspective of representations in deep neural networks. Specifically, I will show how incorporating geometric operators into network architectures can enhance their ability to model a wide range of geometric transformations, from simple rigid motions to complex multi-body dynamics and deformations. I will present a series of approaches that embed geometric structure into latent spaces, leading to networks that demonstrate improved generalization, data efficiency, robustness, and interoperability across diverse visual tasks, from perception and understanding to interaction with the visual world.