After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Efficient Meta-Reinforcement Learning via Probabilistic Context Variables
Kate Rakelly, University of California, Berkeley
We'll start with why it's important to design reinforcement learning agents that can leverage prior experience to quickly generalize and adapt to new tasks and environments. We'll show one way to formalize this problem as meta-reinforcement learning, and give some examples. Mostly, we'll focus on our approach PEARL that adapts by conditioning the RL agent on latent context variables inferred from collected experience. We'll explain how this approach allows us to make use of off-policy RL algorithms; as a result, PEARL is 20-100x more sample-efficient than prior methods. We'll also go into the problem of exploration in meta-RL and how PEARL uses Thompson sampling to explore.