Scaling Laws for Reward Model Overoptimization

Reward model, Reinforcement learning