This event has passed.

Spring 2022 GRASP SFI: Georgios Georgakis, University of Pennsylvania, “Cross-modal Map Learning for Vision and Language Navigation”

Name: Spring 2022 GRASP SFI: Georgios Georgakis, University of Pennsylvania, “Cross-modal Map Learning for Vision and Language Navigation”
Start: 2022-04-13T15:00:00-04:00
End: 2022-04-13T16:00:00-04:00
Location: Levine 512

April 13, 2022 @ 3:00 pm - 4:00 pm

*This was a HYBRID Event with in-person attendance in Levine 512 and Virtual attendance…

ABSTRACT

We consider the problem of Vision-and-Language Navigation (VLN) in previously unseen realistic indoor environments. Arguably, the biggest challenge in VLN is grounding the natural language to the visual input. The majority of current methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or using cross-modal attention over the egocentric RGB-D observations of the agent. We are motivated by studies on navigation of biological systems that suggest humans build cognitive maps during such tasks. In contrast to other works, we argue that an egocentric map offers a more natural representation for this task. In this talk, we will explore a novel navigation system for the VLN task in continuous environments that learns a language-informed representation for both map and trajectory prediction. This approach semantically grounds the language through an egocentric map prediction task that learns to hallucinate information outside the field-of-view of the agent. This is followed by spatial grounding of the instruction by path prediction on the egocentric map. We experimentally test the basic hypothesis that language-driven navigation can be solved given a map, and then show competitive results on the full VLN-CE benchmark.

Presenter

Georgios Georgakis

Georgios Georgakis is a Postdoctoral Researcher in the GRASP lab at the University of Pennsylvania working with Prof. Kostas Daniilidis. He received his PhD and MSc from George Mason University advised by Prof. Jana Kosecka. His research interests lie at the intersection of Computer Vision and Robotics and has worked on multiple topics such as object detection and recognition in RGB-D data, keypoint and descriptor learning, 3D object pose estimation, human mesh recovery, and visual-based navigation. He has spent time as a research intern at Siemens Corporate Technology and United Imaging Intelligence. Prior to joining George Mason University he completed his Diploma in Computer Engineering from the Technical University of Crete in Greece working on landmark recognition and localization for an online robotic soccer competition.

Details

Date:: April 13, 2022
Time:: 3:00 pm - 4:00 pm
Event Category:: Seminars

Venue

Levine 512

3330 Walnut Street
Philadelphia, + Google Map