Over the past decade, we have seen tremendous advances in computer vision. However, despite the stunning performance on some benchmarks, there are several hurdles in bringing these computer vision systems to real world scenarios like robotics. The first hurdle is recognizing small manipulable objects in scenes, where current systems fail because they use bottom-up models that are incapable of top-down reasoning. Another challenge is learning models for rare concepts; including concepts not in our vision datasets and generalizing models to rare instances of known concepts.
In this talk, I will present my efforts to address these limitations and bring computer vision systems closer to working in real world scenarios. I will begin with models that can implicitly learn top-down contextual structure, which is necessary to recognize small and challenging objects. I will then describe a large-scale constrained semi-supervised learning framework, which uses millions of web images to continuously learn models for new concepts and discover relationships between these concepts, without any human annotations. Finally, I will talk about optimization strategies that help algorithms generalize to rare/unusual examples on their own.