exp01_dpo

このリポジトリは、Qwen/Qwen3-4B-Instruct-2507 をベースモデルとし、QLoRA (4-bit, Unsloth) を用いてファインチューニングされた LoRA アダプターを提供します。

【重要】本リポジトリには LoRA アダプターの重みのみが含まれています。ベースモデルは別途ロードする必要があります。

学習の目的

このアダプターは、構造化出力（JSON / YAML / XML / TOML / CSV）の精度向上を目的としてトレーニングされています。

学習時、損失（Loss）は最終的なアシスタントの出力にのみ適用され、中間的な推論プロセス（Chain-of-Thought）はマスクされています。

Model Details

Base Model: Qwen/Qwen3-4B-Instruct-2507
Training Type: sft
Framework: Unsloth + TRL + PEFT
Generated: 2026-02-06 05:55:41

Training Configuration

Dataset

Parameter	Value
Dataset ID	`N/A`
Max Sequence Length	N/A
Validation Ratio	N/A
Holdout Ratio	N/A
CoT Masking	true (アシスタント出力のみ適用)
Upsampling	N/A

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha	32
Dropout	0.05

Training Hyperparameters

Parameter	Value
Epochs	N/A
Batch Size	N/A
Gradient Accumulation	N/A
Learning Rate	N/A
Warmup Ratio	N/A
Weight Decay	N/A
Seed	N/A

Usage

With PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "kevineen/Qwen3-4B-instruct-2507-exp01-dpo")

With Unsloth (Faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="kevineen/Qwen3-4B-instruct-2507-exp01-dpo",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Training Command

uv run  matsuo-train --config configs/experiments/exp01_dpo.yaml

License

This adapter is released under the Apache 2.0 License.

ソースおよびライセンス（重要:必ず記載する事）

学習データ

Dataset: N/A
Dataset License: CC-BY-4.0

データセットライセンスについて

本データセットは、CC-BY-4.0 ライセンスの条項に基づき、使用および再配布が可能です。

遵守事項

利用者は、以下の両方を遵守する必要があります:

データセットの帰属表記（クレジット）に関する要件
ベースモデルの元の利用規約

ベースモデル: Qwen/Qwen3-4B-Instruct-2507 は、その元の利用規約に従う必要があります。

Citation

If you use this model, please cite:

@misc{exp01_dpo},
  title={StructEval-T: exp01_dpo},
  author={kevineen},
  year={2026},
  note={Fine-tuned for structured output generation}
}

Generated by matsuo-lastexam submission tools

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kevineen/Qwen3-4B-instruct-2507-exp01-dpo

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5370)

this model