ABSTRACT
Machine learning is revolutionizing our world: computers can recognize images, translate language, and even play games competitively with humans. However, there is a missing piece that is necessary for computers to take actions in the real world. My research studies Predictive Vision with the goal of anticipating the future events that may happen. To tackle this challenge, I present predictive vision algorithms that learn directly from large amounts of raw, unlabeled data. Capitalizing on millions of natural videos, my work develops methods for machines to learn to anticipate the visual future, forecast human actions, and recognize ambient sounds.