Group Relative Policy Optimization

Reinforcement learning Algorithm that DeepSeek proposed.

Prof Mihai Nica has a nice explanation video: How does DeepSeek learn? GRPO explained with Triangle Creatures.