SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning Paper • 2505.02486 • Published May 5, 2025
KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation Paper • 2509.00366 • Published Aug 30, 2025
CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification Paper • 2603.01940 • Published Mar 2 • 24
PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios Paper • 2601.22575 • Published Jan 30 • 1
UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents Paper • 2605.29534 • Published 15 days ago • 15
OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants Paper • 2605.26485 • Published 17 days ago • 3
UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents Paper • 2605.29534 • Published 15 days ago • 15
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Paper • 2605.26114 • Published 18 days ago • 64
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published Apr 5 • 51
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published Apr 5 • 51
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding Paper • 2505.05446 • Published May 8, 2025
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments Paper • 2602.06075 • Published Feb 3 • 14
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents Paper • 2603.08013 • Published Mar 9 • 16
CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification Paper • 2603.01940 • Published Mar 2 • 24
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28, 2025 • 85
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models Paper • 2311.07575 • Published Nov 13, 2023 • 15
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25, 2024 • 6