xhluca's picture
Add A3-Qwen3.5-9B results (WebArena, VisualWebArena, WorkArena-L1, MiniWoB)
66b7a63 verified
|
raw
history blame
2.34 kB

A3-Qwen3.5-9B

This agent is GenericAgent from AgentLab, fine-tuned using the Agent-as-Annotators (A3) pipeline.

  • Model Name: A3-Qwen3.5-9B
  • Base Model: Qwen/Qwen3.5-9B
  • Model Architecture:
    • Type: Vision-Language Model (VLM)
    • Architecture: Causal LM with vision encoder
    • Number of Parameters: 9B
  • Input/Output Format:
    • Input: Accessibility tree + Set-of-Mark (SoM) screenshot
    • Output: Text action in BrowserGym format
    • Flags:
      GenericPromptFlags(
          obs=ObsFlags(
              use_html=False,
              use_ax_tree=True,
              use_tabs=True,
              use_focused_element=True,
              use_error_logs=True,
              use_history=True,
              use_past_error_logs=False,
              use_action_history=True,
              use_think_history=False,
              use_diff=False,
              html_type='pruned_html',
              use_screenshot=True,
              use_som=True,
              extract_visible_tag=True,
              extract_clickable_tag=True,
              extract_coords='False',
              filter_visible_elements_only=False,
          ),
          action=ActionFlags(
              action_set=HighLevelActionSetArgs(
                  subsets=('webarena',),
                  multiaction=False,
                  strict=False,
                  retry_with_force=True,
                  demo_mode='off',
              ),
              long_description=False,
              individual_examples=False,
          ),
          use_plan=False,
          use_criticise=False,
          use_thinking=True,
          use_memory=False,
          use_concrete_example=True,
          use_abstract_example=True,
          use_hints=True,
          enable_chat=False,
          max_prompt_tokens=57344,
          be_cautious=True,
          extra_instructions=None,
      )
      
  • Training Details:
    • Dataset: WebSynth trajectories collected via the A3 pipeline (agent-generated annotations on real websites)
    • Fine-tuning method: Supervised Fine-Tuning (SFT) with FSDP
    • Temperature at inference: 0.6
  • Paper Link: (forthcoming — COLM 2026 submission)
  • Code Repository: https://github.com/McGill-NLP/llm-annotators
  • License: Apache-2.0