A team of Uber AI researchers has achieved record high scores and beaten previously unsolved Atari games with algorithms that remember and build off their past successes.
Highlighted this week in Nature, the Go-Explore family of algorithms to address limitations of traditional reinforcement learning algorithms, which struggle with complex games that provide sparse or deceptive feedback.
Performance on Atari games is a popular benchmark for reinforcement learning algorithms. But many algorithms fail to thoroughly explore promising avenues, instead going off track to find potential new solutions.
In this paper, the researchers applied a simple principle — “first return, then explore,” creating algorithms that remember promising states from past games, return to those states, and then intentionally explore from that point to further maximize reward.
The researchers used a variety of NVIDIA GPUs at OpenAI and Uber data centers to develop the algorithms.
The software determines which plays to revisit by storing screen grabs of past games and grouping together similar-looking images to find starting points it should return to in future rounds.
“The reason our approach hadn’t been considered before is that it differs strongly from the dominant approach that has historically been used for addressing these problems in the reinforcement learning community, called ‘intrinsic motivation,” said researchers Adrien Ecoffet, Joost Huizinga, and Jeff Clune. “In intrinsic motivation, instead of dividing exploration into returning and exploring like we do, the agent is simply rewarded for discovering new areas.”
With the return-and-explore technique, Go-Explore achieved massive improvements on a collection of 55 Atari games, beating state-of-the-art algorithms 85.5 percent of the time. The algorithm set a record — beating both the human world record and past reinforcement learning records — on the complex Montezuma’s Revenge game.
The paper also demonstrated how Go-Explore could be applied to real-world challenges including robotics, drug design, and language processing.