Abstract: The semantic mapping of the environment requires simultaneous segmentation and categorization of the acquired stream of sensory information. The existing methods typically consider the semantic mapping as the final goal and differ in the number and types of considered semantic categories. We envision semantic understanding of the environment as an on-going process and seek representations which can be refined and adapted depending on the task and robot’s interaction with the environment.
The proposed approach uses the Conditional Random Field framework to infer the semantic categories in a scene (e.g. ground, structure, furniture and props categories in indoors or ground, sky, building, vegetation and objects in outdoors). Using visual and 3D data a novel graph structure and effective set of features are exploited for efficient learning and inference, obtaining better or comparable results at the fraction of computational cost, in publicly available RGB-D and vision and 3D lidar sensors datasets. The chosen representation naturally lends itself for on-line recursive belief updates with a simple soft data association mechanism, and can seamlessly integrate evidence from multiple sensors with overlapping but possibly different fields of view (FOV), account for missing data and predict semantic labels over the spatial union of sensors coverages.
Check out the talk here…