Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published May 15 • 36
NGRPO: Negative-enhanced Group Relative Policy Optimization Paper • 2509.18851 • Published Sep 23, 2025 • 2
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization Paper • 2605.19436 • Published 28 days ago • 14
GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding Paper • 2605.15250 • Published May 14 • 13
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published May 13 • 102
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking Paper • 2605.12995 • Published May 13 • 2
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward Paper • 2605.12495 • Published May 12 • 35
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 67
KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning Paper • 2602.00400 • Published Jan 30
SODA: Semi On-Policy Black-Box Distillation for Large Language Models Paper • 2604.03873 • Published Apr 23 • 2
Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published 29 days ago • 30