Deep Reinforcement Learning at the Edge of the Statistical Precipice

Reinforcement learning, Model evaluation

In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field.