qwen3-4b-agent-trajectory-lora (Merged Model)

This repository provides a LoRA adapter fine-tuned from Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

[Merge Information] This model is a merged adapter created using MergeKit (DARE-TIES) to combine the strengths of the following two models:

maru-miya/lora_agentbench_qwen3_4b_d20_t1 (General Agent focus)
maru-miya/lora_agentbench_qwen3_4b_d21_t9_db (DB specialized focus)

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Training & Merge Configuration

[Merge Settings (MergeKit)]

Method: DARE-TIES
Base model for merge: Qwen/Qwen3-4B-Instruct-2507
Models & Parameters:
- maru-miya/lora_agentbench_qwen3_4b_d20_t1 (weight: 1.0, density: 0.7)
- maru-miya/lora_agentbench_qwen3_4b_d21_t9_db (weight: 1.2, density: 0.7)

[Original LoRA Adapter 1: d20_t1]

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (full precision base)
Max sequence length: 2048
Epochs: 2
Learning rate: 2e-06
LoRA: r=16, alpha=32

[Original LoRA Adapter 2: d21_t9_db]

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (full precision base)
Max sequence length: 2048
Epochs: 2
Learning rate: 2e-05
LoRA: r=16, alpha=32

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms (IMPORTANT)
Training data:

u-10bei/sft_alfworld_trajectory_dataset_v5

u-10bei/dbbench_sft_dataset_react

This repository does NOT redistribute the dataset.
Users must comply with the dataset license and base model terms.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

F16

Model tree for maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5265)

this model

maru-miya
/

merged_agentbench_qwen3_4b_dare_ties_t1

qwen3-4b-agent-trajectory-lora (Merged Model)

Training Objective

Training & Merge Configuration

Usage

Model tree for maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1

Datasets used to train maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1