Set benchmark_tuned=Yes for WebArena
Browse files
results/A3-Qwen3.5-9B/webarena.json
CHANGED
|
@@ -7,7 +7,7 @@
|
|
| 7 |
"score": 42.1,
|
| 8 |
"std_err": 1.7,
|
| 9 |
"benchmark_specific": "No",
|
| 10 |
-
"benchmark_tuned": "
|
| 11 |
"followed_evaluation_protocol": "Yes",
|
| 12 |
"reproducible": "Yes",
|
| 13 |
"comments": "812 tasks (full benchmark). Fine-tuned on A3-Synth trajectories via the Agent-as-Annotators framework. A3-Synth is derived from WebArena environments but uses entirely different, synthetically generated tasks.",
|
|
|
|
| 7 |
"score": 42.1,
|
| 8 |
"std_err": 1.7,
|
| 9 |
"benchmark_specific": "No",
|
| 10 |
+
"benchmark_tuned": "Yes",
|
| 11 |
"followed_evaluation_protocol": "Yes",
|
| 12 |
"reproducible": "Yes",
|
| 13 |
"comments": "812 tasks (full benchmark). Fine-tuned on A3-Synth trajectories via the Agent-as-Annotators framework. A3-Synth is derived from WebArena environments but uses entirely different, synthetically generated tasks.",
|