Add A3-Qwen3.5-9B results (WebArena, VisualWebArena, WorkArena-L1, MiniWoB)
#12
by xhluca - opened
A3-Qwen3.5-9B
A 9B vision-language model fine-tuned using the Agent-as-Annotators (A3) pipeline on WebSynth trajectories.
Results
| Benchmark | Score | Std Err |
|---|---|---|
| WebArena | 42.1 | 1.7 |
| VisualWebArena | 33.7 | 1.6 |
| WorkArena-L1 | 51.5 | 2.8 |
| MiniWoB | 69.0 | 1.9 |
- Base model: Qwen3.5-9B
- Agent: GenericAgent from AgentLab
- Code: https://github.com/McGill-NLP/agent-as-annotators
- Paper: https://arxiv.org/abs/2604.07776
LGTM!
jaiswala changed pull request status to merged