--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/sft_alfworld_trajectory_dataset_v5 - u-10bei/dbbench_sft_dataset_react language: - en license: mit library_name: peft pipeline_tag: text-generation tags: - lora - agent - tool-use - alfworld - dbbench - merge - dare-ties --- # qwen3-4b-agent-trajectory-lora (Merged Model) This repository provides a **LoRA adapter** fine-tuned from **Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**. **[Merge Information]** This model is a merged adapter created using **MergeKit (DARE-TIES)** to combine the strengths of the following two models: 1. `maru-miya/lora_agentbench_qwen3_4b_d20_t1` (General Agent focus) 2. `maru-miya/lora_agentbench_qwen3_4b_d21_t9_db` (DB specialized focus) This repository contains **LoRA adapter weights only**. The base model must be loaded separately. ## Training Objective This adapter is trained to improve **multi-turn agent task performance** on ALFWorld (household tasks) and DBBench (database operations). Loss is applied to **all assistant turns** in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors. ## Training & Merge Configuration **[Merge Settings (MergeKit)]** - Method: DARE-TIES - Base model for merge: Qwen/Qwen3-4B-Instruct-2507 - Models & Parameters: - `maru-miya/lora_agentbench_qwen3_4b_d20_t1` (weight: 1.0, density: 0.7) - `maru-miya/lora_agentbench_qwen3_4b_d21_t9_db` (weight: 1.2, density: 0.7) **[Original LoRA Adapter 1: d20_t1]** - Base model: Qwen/Qwen3-4B-Instruct-2507 - Method: LoRA (full precision base) - Max sequence length: 2048 - Epochs: 2 - Learning rate: 2e-06 - LoRA: r=16, alpha=32 **[Original LoRA Adapter 2: d21_t9_db]** - Base model: Qwen/Qwen3-4B-Instruct-2507 - Method: LoRA (full precision base) - Max sequence length: 2048 - Epochs: 2 - Learning rate: 2e-05 - LoRA: r=16, alpha=32 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen3-4B-Instruct-2507" adapter = "maru-miya/merged_agentbench_qwen3_4b_dare_ties_t1" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter) Sources & Terms (IMPORTANT) Training data: u-10bei/sft_alfworld_trajectory_dataset_v5 u-10bei/dbbench_sft_dataset_react This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.