xhluca commited on
Commit
91c446a
·
verified ·
1 Parent(s): 56da91d

Set benchmark_tuned=Yes for WebArena

Browse files
results/A3-Qwen3.5-9B/webarena.json CHANGED
@@ -7,7 +7,7 @@
7
  "score": 42.1,
8
  "std_err": 1.7,
9
  "benchmark_specific": "No",
10
- "benchmark_tuned": "No",
11
  "followed_evaluation_protocol": "Yes",
12
  "reproducible": "Yes",
13
  "comments": "812 tasks (full benchmark). Fine-tuned on A3-Synth trajectories via the Agent-as-Annotators framework. A3-Synth is derived from WebArena environments but uses entirely different, synthetically generated tasks.",
 
7
  "score": 42.1,
8
  "std_err": 1.7,
9
  "benchmark_specific": "No",
10
+ "benchmark_tuned": "Yes",
11
  "followed_evaluation_protocol": "Yes",
12
  "reproducible": "Yes",
13
  "comments": "812 tasks (full benchmark). Fine-tuned on A3-Synth trajectories via the Agent-as-Annotators framework. A3-Synth is derived from WebArena environments but uses entirely different, synthetically generated tasks.",