| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen3.5-9B |
| library_name: transformers |
| tags: |
| - qwen |
| - claude |
| - opus |
| - reasoning |
| - distill |
| datasets: |
| - nohurry/Opus-4.6-Reasoning-3000x-filtered |
| - Jackrong/Qwen3.5-reasoning-700x |
| - TeichAI/claude-4.5-opus-high-reasoning-250x |
| --- |
| # Qwen3.5-9B Claude Opus 4.6 Reasoning Distill β GGUF |
| |
| GGUF quantizations of [empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill](https://huggingface.co/empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill), a reasoning-focused fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B). |
| |
| This model was trained to produce detailed chain-of-thought reasoning inside `<think>` tags before giving its final answer, distilled from Claude Opus 4.6 and Qwen3.5 reasoning traces. |
| |
| ## Quantizations |
| |
| | File | Quant | Size | Description | |
| |------|-------|------|-------------| |
| | `qwen3.5-9b-opus4.6-distill-Q2_K.gguf` | Q2_K | ~3.5 GB | Smallest, lowest quality. For very constrained devices. | |
| | `qwen3.5-9b-opus4.6-distill-Q3_K_M.gguf` | Q3_K_M | ~4.5 GB | Low quality, usable for testing. | |
| | `qwen3.5-9b-opus4.6-distill-Q4_K_M.gguf` | Q4_K_M | ~5.5 GB | **Recommended.** Best balance of quality and size. | |
| | `qwen3.5-9b-opus4.6-distill-Q5_K_M.gguf` | Q5_K_M | ~6.5 GB | High quality, moderate size. | |
| | `qwen3.5-9b-opus4.6-distill-Q6_K.gguf` | Q6_K | ~7.5 GB | Very high quality, near-lossless. | |
| | `qwen3.5-9b-opus4.6-distill-Q8_0.gguf` | Q8_0 | ~9.5 GB | Highest quality quantization. | |
| | `qwen3.5-9b-opus4.6-distill-f16.gguf` | F16 | ~18 GB | Full precision, no quantization loss. | |
| |
| For most users, **Q4_K_M** or **Q5_K_M** is the sweet spot. |
| |
| ## How to Use |
| |
| ### llama.cpp |
| |
| ```bash |
| llama-cli -m qwen3.5-9b-opus4.6-distill-Q5_K_M.gguf -p "<|im_start|>system\nYou are a deep reasoning AI. Think carefully inside <think> tags before answering.<|im_end|>\n<|im_start|>user\nExplain why the sky is blue.<|im_end|>\n<|im_start|>assistant\n" -n 2048 |
| ``` |
| |
| ### Ollama |
| |
| ```bash |
| ollama run empero-ai/qwen3.5-9b-opus4.6-distill |
| ``` |
| |
| ### LM Studio / GPT4All / Jan |
| |
| Download the GGUF file of your choice and load it directly in the application. |
| |
| ## Training Details |
| |
| ### Method |
| |
| - **Stage 1 β SFT (Supervised Fine-Tuning):** 3 epochs on ~13K examples teaching the model the `<think>` reasoning format using QLoRA (4-bit, rank 64, alpha 128) |
| - **Base model:** Qwen/Qwen3.5-9B |
| - **Hardware:** RTX 5090 (32GB VRAM) |
| - **Attention:** SDPA |
| - **Optimizer:** Paged AdamW 8-bit |
| - **Learning rate:** 1e-4 with cosine schedule |
| - **Effective batch size:** 8 (batch 1 Γ gradient accumulation 8) |
| - **Max sequence length:** 4096 |
| |
| ### SFT Results |
| |
| | Metric | Epoch 1 | Epoch 2 (best) | Epoch 3 | |
| |--------|---------|-----------------|---------| |
| | Eval Loss | 0.5205 | **0.4809** | 0.4915 | |
| | Eval Token Accuracy | 0.8494 | **0.8615** | 0.8617 | |
| | Eval Entropy | 0.508 | 0.434 | 0.394 | |
| |
| Best checkpoint (epoch 2) was selected via `load_best_model_at_end`. |
| |
| ### Datasets |
| |
| | Dataset | Examples | Type | |
| |---------|----------|------| |
| | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Problem β thinking β solution | |
| | [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) | 633 | ShareGPT with `<think>` tags | |
| | [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | 250 | Messages with `<think>` tags | |
| | [Roman1111111/claude-opus-4.6-10000x](https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x) | 9,631 | Messages with reasoning traces | |
| | **Total** | **12,840** | | |
| |
| ### Output Format |
| |
| The model outputs reasoning in `<think>` tags followed by its final answer: |
| |
| ``` |
| <think> |
| The user is asking about why the sky appears blue. This involves Rayleigh scattering... |
| |
| Sunlight contains all wavelengths of visible light. When it enters Earth's atmosphere, |
| shorter wavelengths (blue/violet) scatter more than longer wavelengths (red/orange)... |
| |
| While violet actually scatters more than blue, our eyes are more sensitive to blue light, |
| and some violet is absorbed by the upper atmosphere... |
| </think> |
| |
| The sky appears blue due to Rayleigh scattering. When sunlight passes through Earth's |
| atmosphere, the shorter blue wavelengths scatter in all directions more than the longer |
| red wavelengths. Although violet light scatters even more, our eyes are more sensitive |
| to blue, and some violet is absorbed higher in the atmosphere β so we perceive the sky |
| as blue. |
| ``` |
| |
| ## About Empero AI |
| |
| This model was developed by [Empero AI](https://empero.org). We build open-source AI tools and models focused on advancing reasoning capabilities in smaller, efficient language models. |
| |
| ## License |
| |
| This model inherits the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license from Qwen3.5-9B. |