MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published Jul 27, 2025 • 15
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE Paper • 2507.21802 • Published Jul 29, 2025 • 19
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity Paper • 2507.21848 • Published Jul 29, 2025 • 9
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22, 2025 • 162
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4, 2025 • 76
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting Paper • 2509.11452 • Published Sep 14, 2025 • 14
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published Oct 8, 2025 • 30
Reinforcing Diffusion Models by Direct Group Preference Optimization Paper • 2510.08425 • Published Oct 9, 2025 • 12
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs Paper • 2509.25771 • Published Sep 30, 2025 • 11
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published Nov 10, 2025 • 27
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 47
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 150
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published Jan 5 • 113
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models Paper • 2603.13985 • Published Mar 14 • 10
AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers Paper • 2603.14706 • Published Mar 16 • 2