Reward model

Reinforcement learning uses reward signals to train the model. Often, the reward is hard to come by, especially when human judgement is critical. In such cases, we can train a reward model that learns human judgement. This model can then teach the agent by providing rewards.

Training large language models with reinforcement learning is a good example (e.g., InstructGPT).

Gao2022scaling studies how the overoptimization of the reward model scales.