I will talk about perceptual representations for robot mobility. I will start by motivating the need for representations that are spatial, object-centric, task-driven, and multi-modal, and discuss three projects that showcase these aspects. First, I will talk about our cognitive mapping and planning architecture for navigation in novel environments and describe how we can build spatial and task-driven representations from first person observations through use of neural network based spatial memory and differentiable planning modules. I will then motivate the need for going beyond monolithic scene level representations to factored object centered 3D representations for scenes and show how to obtain such expressive representations using computationally efficient object detectors. I will close by showing how representations from multiple modalities can be embedded in a joint space through use of unlabelled paired data and how this facilitates transfer of semantic knowledge between different modalities for use in context of robotics.