Abstract: This talk will have two main parts which will be followed by a sampling of other projects related to 3D reconstruction I have worked on.
In the first part I will describe research on binocular stereo that was performed during my doctoral studies at USC. Our approach addresses stereo from a perceptual organization perspective using tensor voting and integrating monocular information. Initially, matching candidates for all pixels are generated by a combination of matching techniques. The matching candidates are then embedded in disparity space where perceptual organization takes place in 3-D neighborhoods. The assumption is that correct matches produce salient, coherent surfaces, while wrong ones do not. Matching candidates that are consistent with the surfaces are kept and grouped into smooth layers. Thus, we achieve surface segmentation based on geometric and not photometric properties. Errors due to occlusion and other factors can be corrected by removing matches whose projections are not consistent in color with their neighbors on the surface. The resulting refined surfaces are used to obtain disparity hypotheses for unmatched pixels. I will present results on widely used benchmark stereo pairs.
In the second part, I will talk about real-time video-based reconstruction of urban environments. This is a project I have worked on in the last two years as a postdoctoral researcher at UNC. Our system collects video from eight cameras, GPS and INS data. The data are processed off-line but in real-time to produce geo-registered, detailed 3D models. I will focus on aspects of the systems that I have worked on. These include a novel plane-sweeping stereo algorithm that analyzes sparse information on the scene to optimize the selection of sweeping directions and planes; a depth map fusion approach that merges the stereo depth maps according to visibility constraints; and a scheme to generate the final model as a multi-resolution triangular mesh. I will show reconstructions which were obtained at speeds faster than real-time, by leveraging the processing power of the GPU while maintaining an accuracy of a few centimeters.