Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback, LLMs