GRPO (Group Relative Policy Optimization) Study Notes
We introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO)
Advertisement
We introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO)
We're a tiny team @deepseek_ai exploring AGI.
- introduction - pretraining data (internet) - tokenization - neural network I/O - neural network internals - inference
Janus-Series: Unified Multimodal Understanding and Generation Models
DeepSeek R1 Vs ChatGPT 01 (My Experience)
Deepseek-r1 is open source and on par with o1 preview - @bindureddy
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning