InstructGPT

Ouyang2022training

Instruction tuning model.

It combines GPT with Reinforcement learning (Training large language models with reinforcement learning), by using human input to create Reward model.