Abstract: If mobile robots are to become ubiquitous, we must first solve fundamental problems in perception. Before a mobile robot system can act intelligently, it must be given — or acquire — a representation of the environment that is useful for planning and control. Perception comes before action, and the perception problem is one of the most difficult we face. An important goal in mobile robotics is the development of perception algorithms that allow for persistent, long-term autonomous operation in unknown situations (over weeks or more). In our effort to achieve long-term autonomy, we have had to solve problems of both metric and semantic estimation. In this talk I will describe two recent and interrelated advances in robot perception aimed at enabling long-term autonomy. The first is relative bundle adjustment (RBA). By using a purely relative formulation, RBA addresses the issue of scalability in estimating consistent world maps from vision sensors. In stark contrast to traditional SLAM, I will show that estimation in the relative framework is constant-time, and crucially, remains so even during loop-closure events. This is important because temporal and spatial scalability are obvious prerequisites for long-term autonomy. Building on RBA, I will then describe co-visibility based place recognition (CoVis). CoVis is a topo-metric representation of the world based on the RBA landmark co-visibility graph. I will show how this representation simplifies data association and improves the performance of appearance based place recognition. I will introduce the “dynamic bag-of-words” model, which is a novel form of query expansion based on finding cliques in the co-visibility graph. The proposed approach avoids the — often arbitrary — discretization of space from the robot’s trajectory that is common to most image-based loop-closure algorithms. Instead, I will show that reasoning on sets of co-visible landmarks leads to a simple model that out-performs pose-based or view-based approaches, in terms of precision and recall. In summary, RBA and CoVis are effective representations and associated algorithms for metric and semantic perception, designed to meet the scalability requirements of long-term autonomous navigation.