Reinforcement Learning Algorithm Helps Train Thousands of Robots Simultaneously

In recent years, model-free deep reinforcement learning algorithms have produced groundbreaking results. However, the current algorithms require a large number of training samples as well as an enormous amount of computing power to achieve the desired results. To help make training more accessible, a team of researchers from NVIDIA developed a GPU-accelerated reinforcement learning simulator that can teach a virtual robot human-like tasks in record time.

With just one NVIDIA Tesla V100 GPU and a CPU core, the team trained the virtual agents to run in less than 20 minutes within the FleX GPU-based physics engine. The work used 10 to 1000x fewer CPU cores than previous works. The simulator can even support hundreds to thousands of virtual robots at the same time on a single GPU.

We measured MuJoCo’s single-core CPU simulation time in a similar setup as we did with our GPU simulation time – at every time step random actions are given to the 28-DoF humanoids lying on the floor. The CPU used is an Intel Core i9-7960X running at 2.80GHz.

“Unlike simulating individual robots on each CPU cores, we load all simulated agents onto the same scene on one GPU, so they can interact and collide with each other,” the researchers stated. “The peak GPU simulation frame time per agent for the humanoid environment is less than 0.02ms.”
“Using FleX, we implement an OpenAI Gymlike interface to perform RL experiments for continuous control locomotion tasks,” the team stated.

GPU Simulation Speed. We measure the speed of GPU simulation for the humanoid task as we increase the number of concurrent humanoids simulated. The total simulations per second peaks at around 60KHz for 750 humanoids, and the best mean GPU simulation frame time per agent is less than 0.02ms. The simulation time grows much slower than the number of humanoids because of the constant CUDA kernels launch overhead, which dominates in total step time when only a few humanoids are available.

Using the OpenAI Roboschool and the Deepmind Parkour environments the team trained the virtual agents to run toward changing targets, recover from falls, and run on complex and uneven terrains.

Prior works compared to the new NVIDIA work

Algorithm	CPU Cores	GPUS	Time (mins)
Evolution Strategies	1440	–	10
Augmented Random Search	48	–	21
Distributed Prioritized Experience Replay	32	1	240
Proximal Policy Optimization w/ GPU Simulation (Ours)	1	1	16

Resources and Times for Training a Humanoid to Run.
“In contrast to prior works that trained locomotion tasks on CPU clusters, with some using hundreds to thousands of CPU cores, we are able to train a humanoid to run in less than 20 minutes on a single machine with 1 GPU and CPU core, making GPU-accelerated RL simulation a viable alternative to CPU-based ones,” the team explained in their paper.
The work is an ongoing research project at NVIDIA. The paper will be presented at the Conference on Robot Learning in Zurich, Switzerland this week.
The team says they will next train their virtual agents in more complex humanoid environments, allowing the humanoid to actively control the orientation of the rays used to generate the height map. This could lead to the virtual agents navigating obstacles in mid-air.
The team is comprised of researchers Jacky Liang, Viktor Makoviychuk, Ankur Handa, Nuttapong Chentanez, Miles Macklin, and Dieter Fox.
Read more>