14 27 12

Qianhui WU

qianhuiwu

https://www.linkedin.com/in/qianhui-wu-2b1608b7/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

updated a dataset 8 days ago

OpenWebRL/OpenWebRL-RL-Tasks

updated a dataset 9 days ago

OpenWebRL/OpenWebRL-SFT-Trajectories

View all activity

Organizations

upvoted a paper about 5 hours ago

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Paper • 2606.05922 • Published 6 days ago • 36

updated a dataset 8 days ago

OpenWebRL/OpenWebRL-RL-Tasks

Viewer • Updated 8 days ago • 2.2k • 28

updated a dataset 9 days ago

OpenWebRL/OpenWebRL-SFT-Trajectories

Viewer • Updated 9 days ago • 3.09k • 29

published a dataset 15 days ago

OpenWebRL/OpenWebRL-SFT-Trajectories

Viewer • Updated 9 days ago • 3.09k • 29

upvoted an article 19 days ago

Article

A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond

karina-zadorozhny

•

Jan 19

• 26

upvoted a paper 26 days ago

Orchard: An Open-Source Agentic Modeling Framework

Paper • 2605.15040 • Published 27 days ago • 20

submitted a paper to Daily Papers 26 days ago

Orchard: An Open-Source Agentic Modeling Framework

Paper • 2605.15040 • Published 27 days ago • 20

upvoted a paper about 1 month ago

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

Paper • 2604.27711 • Published Apr 30 • 41

liked a dataset 2 months ago

GUI-Libra/GUI-Libra-81K-RL

Viewer • Updated Feb 24 • 738 • 317 • 1

authored 7 papers 3 months ago

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Paper • 2602.22190 • Published Feb 25 • 17

liked a model 3 months ago

ChilleD/SynthAgent

Updated Feb 5 • 3

upvoted a paper 3 months ago

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Paper • 2602.22190 • Published Feb 25 • 17

upvoted 2 papers 7 months ago

GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

Paper • 2511.04307 • Published Nov 6, 2025 • 16

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Paper • 2510.23538 • Published Oct 27, 2025 • 99

Qianhui WU

AI & ML interests

Recent Activity

Organizations

qianhuiwu's activity

A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond