Qwen3.5-2b-Kimi-and-Opus-Distillation
This model is a distilled version of Qwen 3.5 (2B), fine-tuned using high-quality reasoning and conversational datasets. The distillation process leverages responses from Kimi 2.5 and Claude 4.6 Opus to enhance the reasoning capabilities and conversational depth of the lightweight 2B parameter model.
Model Details
- Developed by: ertghiu256
- Base Model: Qwen/Qwen3.5-2B
- Language(s): English
- License: Apache 2.0
- Finetuning Technique: Supervised Fine-Tuning (SFT) / Distillation
Training Metadata
The model was trained with an emphasis on high-quality data density rather than sheer volume, focusing on a specific subset of data to optimize performance within a short training window.
- Training Hardware: 1x T4 (Google Colab/GCP)
- Total Training Time: ~2 Hours
- Max Training Steps: 70
- Learning Rate: $1 \times 10^{-4}$ (1e-4)
Dataset Composition
The model was trained on a curated mixture of the following three datasets:
- Ali-Yaser/KIMI-K2.5-450000x-ShareGPT: Large-scale conversational data distilled from Kimi, providing natural, helpful, and long-context-aware dialogue.
- allenai/tulu-3-sft-mixture: A diverse, high-quality SFT mixture designed to improve general instruction-following capabilities.
- nohurry/Opus-4.6-Reasoning-3000x-filtered: A highly filtered set of complex reasoning chains distilled from Claude 4.6 Opus, designed to improve the model's logical "Chain of Thought" (CoT) processes.
Intended Use
- Reasoning-heavy tasks: Despite its size, the model is tuned to handle logical queries better than the base 2B.
- Mobile/Edge Deployment: Due to its 2B parameter count, it is ideal for local-first applications.
- Conversational AI: High-quality dialogue based on Kimi and Opus styles.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ertghiu256/Qwen3.5-2b-Kimi-and-Opus-Distillation"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the importance of distillation in small language models."
messages = [
{"role": "system", "content": "You are a helpful and logical assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Limitations and Biases
While the distillation from Opus and Kimi provides a significant boost, the model's 2B parameter size still limits its "world knowledge" compared to larger models. Users should verify factual claims. The model may inherit biases present in the distillation source datasets.
Uploaded finetuned model
- Developed by: ertghiu256
- License: apache-2.0
- Finetuned from model : unsloth/Qwen3.5-2B
This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 103
