Spaces:

ServiceNow
/

browsergym-leaderboard

Running

xhluca commited on 6 days ago

Commit

91c446a

verified ·

1 Parent(s): 56da91d

Set benchmark_tuned=Yes for WebArena

Files changed (1) hide show

results/A3-Qwen3.5-9B/webarena.json CHANGED Viewed

@@ -7,7 +7,7 @@
         "score": 42.1,
         "std_err": 1.7,
         "benchmark_specific": "No",
-        "benchmark_tuned": "No",
         "followed_evaluation_protocol": "Yes",
         "reproducible": "Yes",
         "comments": "812 tasks (full benchmark). Fine-tuned on A3-Synth trajectories via the Agent-as-Annotators framework. A3-Synth is derived from WebArena environments but uses entirely different, synthetically generated tasks.",

         "score": 42.1,
         "std_err": 1.7,
         "benchmark_specific": "No",
+        "benchmark_tuned": "Yes",
         "followed_evaluation_protocol": "Yes",
         "reproducible": "Yes",
         "comments": "812 tasks (full benchmark). Fine-tuned on A3-Synth trajectories via the Agent-as-Annotators framework. A3-Synth is derived from WebArena environments but uses entirely different, synthetically generated tasks.",