Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts Paper • 2606.05922 • Published 6 days ago • 36
view article Article A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond karina-zadorozhny • Jan 19 • 26
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control Paper • 2604.27711 • Published Apr 30 • 41
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension Paper • 2406.02536 • Published Jun 4, 2024
Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text Paper • 2211.11300 • Published Nov 21, 2022 • 1
On Memory Construction and Retrieval for Personalized Conversational Agents Paper • 2502.05589 • Published Feb 8, 2025
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents Paper • 2510.09577 • Published Oct 10, 2025 • 8
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published Nov 6, 2025 • 16
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL Paper • 2602.22190 • Published Feb 25 • 17
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL Paper • 2602.22190 • Published Feb 25 • 17
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published Nov 6, 2025 • 16
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 99