Qwen3.5-2b-Kimi-and-Opus-Distillation

This model is a distilled version of Qwen 3.5 (2B), fine-tuned using high-quality reasoning and conversational datasets. The distillation process leverages responses from Kimi 2.5 and Claude 4.6 Opus to enhance the reasoning capabilities and conversational depth of the lightweight 2B parameter model.

Model Details

Developed by: ertghiu256
Base Model: Qwen/Qwen3.5-2B
Language(s): English
License: Apache 2.0
Finetuning Technique: Supervised Fine-Tuning (SFT) / Distillation

Training Metadata

The model was trained with an emphasis on high-quality data density rather than sheer volume, focusing on a specific subset of data to optimize performance within a short training window.

Training Hardware: 1x T4 (Google Colab/GCP)
Total Training Time: ~2 Hours
Max Training Steps: 70
Learning Rate: $1 \times 10^{-4}$ (1e-4)

Dataset Composition

The model was trained on a curated mixture of the following three datasets:

Ali-Yaser/KIMI-K2.5-450000x-ShareGPT: Large-scale conversational data distilled from Kimi, providing natural, helpful, and long-context-aware dialogue.
allenai/tulu-3-sft-mixture: A diverse, high-quality SFT mixture designed to improve general instruction-following capabilities.
nohurry/Opus-4.6-Reasoning-3000x-filtered: A highly filtered set of complex reasoning chains distilled from Claude 4.6 Opus, designed to improve the model's logical "Chain of Thought" (CoT) processes.

Intended Use

Reasoning-heavy tasks: Despite its size, the model is tuned to handle logical queries better than the base 2B.
Mobile/Edge Deployment: Due to its 2B parameter count, it is ideal for local-first applications.
Conversational AI: High-quality dialogue based on Kimi and Opus styles.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ertghiu256/Qwen3.5-2b-Kimi-and-Opus-Distillation"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the importance of distillation in small language models."
messages = [
    {"role": "system", "content": "You are a helpful and logical assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Limitations and Biases

While the distillation from Opus and Kimi provides a significant boost, the model's 2B parameter size still limits its "world knowledge" compared to larger models. Users should verify factual claims. The model may inherit biases present in the distillation source datasets.

Uploaded finetuned model

Developed by: ertghiu256
License: apache-2.0
Finetuned from model : unsloth/Qwen3.5-2B

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 103

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for ertghiu256/Qwen3.5-2b-Kimi-and-Opus-Distillation

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

(100)

this model

Quantizations

4 models

ertghiu256
/

Qwen3.5-2b-Kimi-and-Opus-Distillation