papers-to-read
updated
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published • 282
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published • 254
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published • 263
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published • 166
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
• 2507.15846
• Published • 135
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published • 123
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper
• 2507.02592
• Published • 126
4KAgent: Agentic Any Image to 4K Super-Resolution
Paper
• 2507.07105
• Published • 107
ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents
Paper
• 2507.22827
• Published • 101
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published • 240
Paper
• 2508.10104
• Published • 305
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published • 211
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper
• 2508.04026
• Published • 164
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
• 2508.05629
• Published • 191
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published • 162
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published • 665
A.S.E: A Repository-Level Benchmark for Evaluating Security in
AI-Generated Code
Paper
• 2508.18106
• Published • 350
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published • 238
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
A Survey of Scientific Large Language Models: From Data Foundations to
Agent Frontiers
Paper
• 2508.21148
• Published • 142
Why Language Models Hallucinate
Paper
• 2509.04664
• Published • 199
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published • 105
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for
Open-Ended Deep Research
Paper
• 2509.13312
• Published • 106
Scaling Agents via Continual Pre-training
Paper
• 2509.13310
• Published • 117
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper
• 2509.06501
• Published • 82
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published • 76
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
• 2509.01055
• Published • 81
MachineLearningLM: Continued Pretraining Language Models on Millions of
Synthetic Tabular Prediction Tasks Scales In-Context ML
Paper
• 2509.06806
• Published • 63
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published • 118