Add A3-Qwen3.5-9B results (WebArena, VisualWebArena, WorkArena-L1, MiniWoB)

#12
by xhluca - opened

A3-Qwen3.5-9B

A 9B vision-language model fine-tuned using the Agent-as-Annotators (A3) pipeline on WebSynth trajectories.

Results

Benchmark Score Std Err
WebArena 42.1 1.7
VisualWebArena 33.7 1.6
WorkArena-L1 51.5 2.8
MiniWoB 69.0 1.9
ServiceNow org

LGTM!

jaiswala changed pull request status to merged

Sign up or log in to comment