Abstract: Object detection and recognition is generally posed as a matching problem between the object representation and the image features (e.g., aligning pictorial cues, shape correspondence, constellations of parts, etc.) while rejecting the background features using an outlier process. In this talk, we take a different approach: we formulate the object detection problem as a problem of aligning elements of the entire scene. The background, instead of being treated as a set of outliers, is used to guide the detection process. Our approach relies on the observation that when we have a big enough database then we can find with high probability some images in the database very close to a query image, as in similar scenes with similar objects arranged in similar spatial configurations. If the images in the retrieval set are partially labeled, then we can transfer the knowledge of the labeling to the query image, and the problem of object recognition becomes a problem of aligning scene regions. But, can we find a dataset large enough to cover a large number of scene configurations? Given an input image, how do we find a good retrieval set, and, finally, how we do transfer the labels to the input image? With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. We will use two datasets; 1) the LabelMe dataset, which contains more than 10,000 labeled images with over 180,000 annotated objects. 2) The tiny images dataset: A dataset of weakly labeled images with more than 79,000,000 images. We use this database to perform object classification, examining performance over a range of semantic levels. For certain classes which are particularly prevalent in the dataset, such as people, we are able to demonstrate a performance using simple nearest-neighbor methods with is comparable to dedicated Viola-Jones style detectors. We also demonstrate a range of other applications including automatic image colorization and orientation determination.
Work in collaboration with Rob Fergus, Bryan Russell, Ce Liu and William T. Freeman