AI Tourist Finds Its Way Around New York City with the Help of Another AI Algorithm

Can you picture what it would be like to navigate the streets of New York City without a smartphone? If you’re a local, it’s easy, but if you’re a tourist, it can be daunting. To help alleviate the problem, researchers from the University of Montreal in Canada, and Facebook developed a deep learning-based system called “Talk the Walk” that can give walking directions without actually knowing your location.
The method uses two virtual AI agents, a “tourist,” and a “guide,” to develop a system that can find its way through the Big Apple via natural language. The virtual guide has access to a 2D map, and the tourist sees photos of the real world. The pair are given a challenge and are asked to solve it. For the agents to solve the problem, they need to work together.
“Grounded language learning has re-gained traction in the AI community, and much attention is currently devoted to virtual embodiment —the development of multi-agent communication tasks in virtual environments—which has been argued to be a viable strategy for acquiring natural language semantics,” the researchers wrote in their paper.

Example of the Talk The Walk task: two agents, a “tourist” and a “guide,” interact with each other via natural language to have the tourist navigate towards the correct location. The guide has access to a map and knows the target location but not the tourist location, while the tourist does not understand the way but can navigate in a 360-degree street view environment.

Using NVIDIA Tesla P100 GPUs, with the cuDNN-accelerated PyTorch deep learning framework, the team trained their recurrent neural network on 360-degree images of several neighborhoods of New York, including Hell’s Kitchen, the East Village, the Financial District, Williamsburg, and the Upper East Side.

The team enlisted both humans and AI agents to complete a location task. The results revealed that the neural network was much better than the human localizers, achieving an 87.08% accuracy rate, while the humans achieved 76% accuracy level.
“We believe Talk The Walk is a useful resource for grounded language learning, and hope it will facilitate research on the intersection of computer vision, goal-directed dialogue systems, reinforcement learning, navigation, and planning,” the researchers stated.
The researchers are releasing their code on GitHub in hopes that other scientists will use it to further their research.
Read More >