Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering Paper • 2605.29648 • Published 19 days ago • 10
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention Paper • 2605.29548 • Published 19 days ago • 11
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation Paper • 2605.29861 • Published 19 days ago • 16
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 18 days ago • 112
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 11 days ago • 112
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders Paper • 2606.07473 • Published 11 days ago • 15
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning Paper • 2606.07299 • Published 11 days ago • 6
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory Paper • 2606.09365 • Published 8 days ago • 3
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution Paper • 2606.10917 • Published 7 days ago • 76
Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics Paper • 2606.12476 • Published 6 days ago
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO Paper • 2605.30789 • Published 14 days ago • 22