Instructions to use Umiharu/Qwen-4B-DB-AlfWorld-v9 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Umiharu/Qwen-4B-DB-AlfWorld-v9 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Umiharu/Qwen-4B-DB-AlfWorld-v9")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Umiharu/Qwen-4B-DB-AlfWorld-v9")
model = AutoModelForMultimodalLM.from_pretrained("Umiharu/Qwen-4B-DB-AlfWorld-v9")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Umiharu/Qwen-4B-DB-AlfWorld-v9 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Umiharu/Qwen-4B-DB-AlfWorld-v9"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Umiharu/Qwen-4B-DB-AlfWorld-v9",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Umiharu/Qwen-4B-DB-AlfWorld-v9

SGLang

How to use Umiharu/Qwen-4B-DB-AlfWorld-v9 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Umiharu/Qwen-4B-DB-AlfWorld-v9" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Umiharu/Qwen-4B-DB-AlfWorld-v9",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Umiharu/Qwen-4B-DB-AlfWorld-v9" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Umiharu/Qwen-4B-DB-AlfWorld-v9",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Umiharu/Qwen-4B-DB-AlfWorld-v9 with Docker Model Runner:
```
docker model run hf.co/Umiharu/Qwen-4B-DB-AlfWorld-v9
```

Qwen-4B-DB-AlfWorld-v9

This repository provides a merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 on datasets u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3.

All LoRA adapter weights have been merged into the base model, and the resulting merged model is saved here as a standalone model.
No external adapter loading is required.

Dataset Notes (IMPORTANT)

For u-10bei/sft_alfworld_trajectory_dataset_v5,
only samples with:

input length ≤ 2048 tokens,
trajectory_outcome == "success",
num_steps ≤ 35

were included in the training set.

These filters were applied to ensure training stability, reduce noisy or failed trajectories, and maintain consistency with the maximum sequence length used during training.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in multi-turn trajectories, enabling the model to learn environment observation, step-by-step reasoning, action execution, tool use, and recovery from errors.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (merged into final weights)
Max sequence length: 2048
Learning rate: 1e-05
LoRA parameters used during training: r=8, alpha=16

Usage (Agent-style Inference Example)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Umiharu/Qwen-4B-DB-AlfWorld-v9"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "You are a household task-solving agent. Respond 'OK' if you are ready."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    temperature=0.2,
    do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Sources & Terms (IMPORTANT)

Training data: u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month: 11

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Umiharu/Qwen-4B-DB-AlfWorld-v9

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1733)

this model

Umiharu
/

Qwen-4B-DB-AlfWorld-v9