-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
Collections
Discover the best community collections!
Collections including paper arxiv:2604.03016
-
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Paper • 2601.22060 • Published • 155 -
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Paper • 2602.02185 • Published • 118 -
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
Paper • 2603.23483 • Published • 61 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 109 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 5
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.37k • 556 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 19.7k • 453 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 18
-
LLM Agent Operating System
Paper • 2403.16971 • Published • 73 -
Real-Time Reasoning Agents in Evolving Environments
Paper • 2511.04898 • Published • 13 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
Paper • 2604.03016 • Published • 36
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 109 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 5
-
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Paper • 2601.22060 • Published • 155 -
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Paper • 2602.02185 • Published • 118 -
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
Paper • 2603.23483 • Published • 61 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.37k • 556 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 19.7k • 453 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 18
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
LLM Agent Operating System
Paper • 2403.16971 • Published • 73 -
Real-Time Reasoning Agents in Evolving Environments
Paper • 2511.04898 • Published • 13 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
Paper • 2604.03016 • Published • 36