Spaces:

ServiceNow
/

browsergym-leaderboard

Running

App Files Files Community

browsergym-leaderboard / results /A3-Qwen3.5-9B /README.md

xhluca

Add A3-Qwen3.5-9B results (WebArena, VisualWebArena, WorkArena-L1, MiniWoB)

66b7a63 verified 12 days ago

preview code

raw

history blame

2.34 kB

A3-Qwen3.5-9B

This agent is GenericAgent from AgentLab, fine-tuned using the Agent-as-Annotators (A3) pipeline.

Model Name: A3-Qwen3.5-9B
Base Model: Qwen/Qwen3.5-9B
Model Architecture:
- Type: Vision-Language Model (VLM)
- Architecture: Causal LM with vision encoder
- Number of Parameters: 9B

Input/Output Format:

Input: Accessibility tree + Set-of-Mark (SoM) screenshot
Output: Text action in BrowserGym format

Flags:

GenericPromptFlags(
    obs=ObsFlags(
        use_html=False,
        use_ax_tree=True,
        use_tabs=True,
        use_focused_element=True,
        use_error_logs=True,
        use_history=True,
        use_past_error_logs=False,
        use_action_history=True,
        use_think_history=False,
        use_diff=False,
        html_type='pruned_html',
        use_screenshot=True,
        use_som=True,
        extract_visible_tag=True,
        extract_clickable_tag=True,
        extract_coords='False',
        filter_visible_elements_only=False,
    ),
    action=ActionFlags(
        action_set=HighLevelActionSetArgs(
            subsets=('webarena',),
            multiaction=False,
            strict=False,
            retry_with_force=True,
            demo_mode='off',
        ),
        long_description=False,
        individual_examples=False,
    ),
    use_plan=False,
    use_criticise=False,
    use_thinking=True,
    use_memory=False,
    use_concrete_example=True,
    use_abstract_example=True,
    use_hints=True,
    enable_chat=False,
    max_prompt_tokens=57344,
    be_cautious=True,
    extra_instructions=None,
)

Training Details:
- Dataset: WebSynth trajectories collected via the A3 pipeline (agent-generated annotations on real websites)
- Fine-tuning method: Supervised Fine-Tuning (SFT) with FSDP
- Temperature at inference: 0.6
Paper Link: (forthcoming — COLM 2026 submission)
Code Repository: https://github.com/McGill-NLP/llm-annotators
License: Apache-2.0