GRASP Special Seminar - Ales Leonardis, University of Ljubljana, "Combining Compositional Shape Hierarchy and Multi-Class Object Taxonomy for Efficient Object Categorization"

Abstract: Visual categorization has been an area of intensive research in the vision community for several decades. Ultimately, the goal is to efficiently detect and recognize an increasing number of object classes. The problem entangles three highly interconnected issues: the internal object representation, which should compactly capture the visual variability of objects and generalize well over each class; a means for learning the representation from a set of input images with as little supervision as possible; and an effective inference algorithm that robustly matches the object representation against the image and scales favorably with the number of objects. In this talk I will present our novel approach which combines a learned compositional hierarchy, representing (2D) shapes of multiple object classes, and a coarse-to-fine matching scheme that exploits a taxonomy of objects to perform efficient object detection.

Our framework for learning a hierarchical compositional shape vocabulary for representing multiple object classes takes simple contour fragments and learns their frequent spatial configurations. These are recursively combined into increasingly more complex and class-specific shape compositions, each exerting a high degree of shape variability. At the top-level of the vocabulary, the compositions represent the whole shapes of the objects. The vocabulary is learned layer after layer, by gradually increasing the size of the window of analysis and reducing the spatial resolution at which the shape configurations are learned. The lower layers are learned jointly on images of all classes, whereas the higher layers of the vocabulary are learned incrementally, by presenting the algorithm with one object class after another.

However, in order for recognition systems to scale to a larger number of object categories, and achieve running times logarithmic in the number of classes, building visual class taxonomies becomes necessary. We propose an approach for speeding up recognition times of multi-class part-based object representations. The main idea is to construct a taxonomy of constellation models cascaded from coarse-to-fine resolution and use it in recognition with an efficient search strategy. The structure and the depth of the taxonomy is built automatically in a way that minimizes the number of expected computations during recognition by optimizing the cost-to-power ratio. The combination of the learned taxonomy with the compositional hierarchy of object shape achieves efficiency both with respect to the representation of the structure of objects and in terms of the number of modeled object classes. The experimental results show that the learned multi-class object representation achieves a detection performance comparable to the current state-of-the-art flat approaches with both faster inference and shorter training times.

Presenter's biography

Aleš Leonardis is a full professor and the head of the Visual Cognitive Systems Laboratory with the Faculty of Computer and Information Science, University of Ljubljana. He is also an adjunct professor at the Faculty of Computer Science, Graz University of Technology. From 1988 to 1991, he was a visiting researcher in the General Robotics and Active Sensory Perception Laboratory at the University of Pennsylvania. From 1995 to 1997, he was a postdoctoral associate at the PRIP, Vienna University of Technology. He was also a visiting researcher and a visiting professor at the Swiss Federal Institute of Technology ETH in Zurich and at the Technische Fakultaet der Friedrich-Alexander-Universitaet in Erlangen, respectively. His research interests include robust and adaptive methods for computer vision, object and scene recognition and categorization, statistical visual learning, 3D object modeling, and biologically motivated vision. He is an author or coauthor of more than 160 papers published in journals and conferences and he coauthored the book Segmentation and Recovery of Superquadrics (Kluwer, 2000). He is an Editorial Board Member of Pattern Recognition, an Editor of the Springer Book Series Computational Imaging and Vision, and an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence. He has served on the program committees of major computer vision and pattern recognition conferences. He was also a program co-chair of the European Conference on Computer Vision, ECCV 2006. He has received several awards. In 2002, he coauthored a paper, “Multiple Eigenspaces,” which won the 29th Annual Pattern Recognition Society award. In 2004, he was awarded a prestigious national Award for scientific achievements. He is a fellow of the IAPR and a member of the IEEE and the IEEE Computer Society.