Published on Wed Dec 09 2020

Topological Planning with Transformers for Vision-and-Language Navigation

Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel Vázquez, Silvio Savarese

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end. We propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan.

0
0
0
Abstract

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.

Wed Jul 29 2020
Computer Vision
Object-and-Action Aware Model for Visual Language Navigation
Vision-and-Language Navigation (VLN) requires turning relatively general natural-language instructions into robot agent actions, on the basis of the visible environment. VLN requires to extract value from two very different types of natural- language information.
0
0
0
Mon Sep 24 2018
Artificial Intelligence
Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation
We propose an end-to-end deep learning model for translating free-formnatural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connect information from both the user's instructions and a topological representation of the environment.
0
0
0
Fri May 14 2021
NLP
Towards Navigation by Reasoning over Spatial Configurations
We show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that that uses the elements of spatial configurations. Our neural agent improves strong baselines on the seen and unseen environments.
3
0
0
Wed Jul 03 2019
Artificial Intelligence
Chasing Ghosts: Instruction Following as Bayesian State Tracking
A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. We formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) within the framework of Bayesian state tracking.
0
0
0
Fri Mar 05 2021
Artificial Intelligence
Structured Scene Memory for Vision-Language Navigation
0
0
0
Fri Mar 01 2019
Artificial Intelligence
A Behavioral Approach to Visual Navigation with Graph Localization Networks
We propose using graph neural networks for localizing the agent in the map. We decompose the action space into primitive behaviors implemented as convolutional or recurrent neural networks. Using the Gibson simulator, we verify that our approach outperforms relevant baselines.
0
0
0