Google DeepMind revealed that its deep learning software is now able to outperform humans in 31 different Atari games. The algorithm, which uses reinforcement learning to master the games, has been described as the “first significant rung of the ladder” towards proving such a system can work, and a significant step towards use in real-world applications.
The Double DQN algorithm combines Q-learning with a flexible deep neural network and was tested on a varied and large set of deterministic Atari 2600 games, reaching human-level performance on many games. Without having the games’ rules programmed into its software, the AI was able to improve by analyzing the pixels on the screen and learning which patterns produce the optimum score. For each game, the network was trained on a single GPU for nearly 1 week.
The main goal of this paper was to investigate whether the overestimations of Q-learning occur in practice and, when they do occur, if they hurt performance.