Facebook researchers developed a reinforcement learning model that can outmatch human competitors in heads-up, no-limit Texas hold’em, and turn endgame hold’em poker.
At the heart of the model is how software-agents handle perfect-information games such as chess, versus imperfect-information games like poker.
Instead of just deciding on its next move, a reinforcement learning software agent playing Texas hold-em needs to make decisions based on all of the information it has seen. It also needs to make complex predictions about what they cannot see in their opponents’ hand.
“ReBeL is effective in large-scale two-player zero-sum imperfect-information games and defeats a top human professional with statistical significance,” the researchers stated in their paper, Combining Deep Reinforcement Learning and Search for Imperfect-Information Games.
What sets this model apart from previous AI poker systems is that the model uses far less domain knowledge than any prior poker AI model, the researchers explained.
“Our algorithm trains a value network and a policy network…through self-play reinforcement learning. Additionally, the algorithm uses the value and policy network for search during self-play,” the researchers explained.
To train the model, the team used the PyTorch deep learning framework with 90 NVIDIA DGX-1 systems, each with eight NVIDIA V100 GPUs.
Training was performed for 1750 epochs, and each epoch included 2,560,000 examples.
Traditionally reinforcement learning algorithms are commonly used in robotics to teach robots how to safely and efficiently operate in the real world through simulations. The researchers say there are numerous potential applications for this model such as autonomous vehicle navigation, and helping robots better interact with their environment.
The researchers have decided to not publicly release the poker model and have instead published an open source implementation for Liar’s Dice, a recreational game not played as competitively by humans.