qwen3-4b-agent-trajectory-lora (Merged Model)

This repository provides a LoRA adapter fine-tuned from Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

[Merge Information] This model is a merged adapter created using MergeKit (DARE-TIES) to combine the strengths of the following two models:

  1. maru-miya/lora_agentbench_qwen3_4b_d20_t1 (General Agent focus)
  2. maru-miya/lora_agentbench_qwen3_4b_d21_t9_db (DB specialized focus)

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Training & Merge Configuration

[Merge Settings (MergeKit)]

  • Method: DARE-TIES
  • Base model for merge: Qwen/Qwen3-4B-Instruct-2507
  • Models & Parameters:
    • maru-miya/lora_agentbench_qwen3_4b_d20_t1 (weight: 1.0, density: 0.7)
    • maru-miya/lora_agentbench_qwen3_4b_d21_t9_db (weight: 1.2, density: 0.7)

[Original LoRA Adapter 1: d20_t1]

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (full precision base)
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 2e-06
  • LoRA: r=16, alpha=32

[Original LoRA Adapter 2: d21_t9_db]

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (full precision base)
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 2e-05
  • LoRA: r=16, alpha=32

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms (IMPORTANT)
Training data:

u-10bei/sft_alfworld_trajectory_dataset_v5

u-10bei/dbbench_sft_dataset_react

This repository does NOT redistribute the dataset.
Users must comply with the dataset license and base model terms.
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1

Adapter
(5265)
this model

Datasets used to train maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1