Qwen3.6 35B A3B — Opus 4.6 Reasoning Distillation (Merged)
Fine-tuned version of Qwen/Qwen3.6-35B-A3B on high-quality reasoning traces distilled from Claude Opus 4.6.
This is the full merged BF16 model in safetensors format. For quantized GGUF versions ready to use with llama.cpp, see rico03/Qwen3.6-35B-Opus-Reasoning-GGUF.
Training Details
- Base model: Qwen/Qwen3.6-35B-A3B (MoE, 3B active params)
- Method: QLoRA (r=16, alpha=16, nf4)
- Datasets: Crownelius/Opus-4.6-Reasoning-3300x + TeichAI/Claude-Opus-4.6-Reasoning-887x
- Examples: ~3046 total
- Epochs: 1
- Final loss: ~0.64
- Hardware: NVIDIA H100 NVL 94GB
- Framework: TRL + PEFT (HuggingFace)
Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"rico03/qwen36-35B-opus-reasoning-merged",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"rico03/qwen36-35B-opus-reasoning-merged"
)
Convert to GGUF
git clone https://github.com/ggerganov/llama.cpp
pip install -r llama.cpp/requirements.txt
python3 llama.cpp/convert_hf_to_gguf.py ./qwen36-35B-opus-reasoning-merged \
--outfile qwen36-opus-f16.gguf \
--outtype f16
# Quantize
./llama.cpp/build/bin/llama-quantize qwen36-opus-f16.gguf qwen36-opus-Q3_K_S.gguf Q3_K_S
./llama.cpp/build/bin/llama-quantize qwen36-opus-f16.gguf qwen36-opus-Q2_K.gguf Q2_K
What improved
- Structured reasoning with explicit thinking before answering
- Better multi-step problem solving and agentic coding
- More consistent response formatting
- Improved mathematical and algorithmic reasoning
- Better frontend and repository-level coding workflows
Hardware Requirements
Requires ~70GB VRAM or RAM for full precision inference. For consumer hardware use the GGUF version.
License
Apache 2.0 — same as base model.
- Downloads last month
- -
Model tree for Rubertigno/Qwen3.6-35B-A3B-Opus-Reasoning
Base model
Qwen/Qwen3.6-35B-A3B