GRASP Special Seminar: Yu Xiang, University of Washington, "3D Object Recognition and Scene Understanding from RGB-D Videos"


Recognizing objects and understanding scenes in 3D is critical for intelligent systems that interact with the 3D world. For instance, in robot manipulation and navigation, robots need to understand the 3D world in order to perform certain tasks. In augmented reality, 3D scene understanding is necessary in offering realistic user experience. In this talk, I will present our efforts in 3D object recognition and scene understanding from RGB-D videos. A new convolutional neural network for end-to-end 6D object pose estimation, i.e., 3D rotation and 3D translation, will be described. The network is able to handle both textured objects and texture-less objects, and is very robust to occlusions between objects. For scene understanding, I will show that the convolutional neural network can be extended to a recurrent neural network for semantic mapping on RGB-D videos, where the network interacts with KinectFusion to jointly reconstruct a 3D scene and recognize semantics in the scene.

Presenter's biography

Yu Xiang is a postdoctoral researcher with Prof. Dieter Fox in Computer Science & Engineering at the University of Washington. His research focuses on visual perception in robotics, with emphasis on understanding objects and scenes in the 3D world from images and videos. Yu Xiang received his Ph.D. in electrical engineering from the University of Michigan at Ann Arbor in 2016 advised by Prof. Silvio Savarese. He was a visiting student researcher in the artificial intelligence lab at Stanford University from 2013 to 2016. He received M.S. degree in computer science from Fudan University in 2010, and B.S. degree in computer science from Fudan University in 2007.