MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments Paper • 2604.13418 • Published 8 days ago • 6
PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference Paper • 2603.02479 • Published Mar 3 • 20
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models Paper • 2506.01062 • Published Jun 1, 2025 • 5