Deep Reinforcement Learning at the Edge of the Statistical Precipice

https://arxiv.org/abs/2108.13264

Reinforcement learning, Model evaluation

In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field.