Abstract
A successful autonomous system needs to not only understand the visual world but also communicate its understanding with humans. To make this possible, language can serve as a natural link between high level semantic concepts and low level visual perception. In this talk, I’ll present our recent work on 3D scene understanding, and show how natural sentential descriptions can be exploited to improve 3D visual parsing, and vice-versa, how image information can help resolve ambiguities in text. I’ll also show how additional information available on the web can facilitate 3D understanding of monocular imagery.