Group Relative Policy Optimization
Reinforcement learning Algorithm that DeepSeek proposed.
Prof Mihai Nica has a nice explanation video: How does DeepSeek learn? GRPO explained with Triangle Creatures.
Reinforcement learning Algorithm that DeepSeek proposed.
Prof Mihai Nica has a nice explanation video: How does DeepSeek learn? GRPO explained with Triangle Creatures.