Transformers
GGUF
English
qwen
claude
opus
reasoning
distill
conversational
empero-ai commited on
Commit
cc9955b
·
verified ·
1 Parent(s): 963a36d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen3.5-9B
7
+ library_name: transformers
8
+ tags:
9
+ - qwen
10
+ - claude
11
+ - opus
12
+ - reasoning
13
+ - distill
14
+ datasets:
15
+ - nohurry/Opus-4.6-Reasoning-3000x-filtered
16
+ - Jackrong/Qwen3.5-reasoning-700x
17
+ - TeichAI/claude-4.5-opus-high-reasoning-250x
18
+ ---
19
+ # Qwen3.5-9B Claude Opus 4.6 Reasoning Distill — GGUF
20
+
21
+ GGUF quantizations of [empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill](https://huggingface.co/empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill), a reasoning-focused fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B).
22
+
23
+ This model was trained to produce detailed chain-of-thought reasoning inside `<think>` tags before giving its final answer, distilled from Claude Opus 4.6 and Qwen3.5 reasoning traces.
24
+
25
+ ## Quantizations
26
+
27
+ | File | Quant | Size | Description |
28
+ |------|-------|------|-------------|
29
+ | `qwen3.5-9b-opus4.6-distill-Q2_K.gguf` | Q2_K | ~3.5 GB | Smallest, lowest quality. For very constrained devices. |
30
+ | `qwen3.5-9b-opus4.6-distill-Q3_K_M.gguf` | Q3_K_M | ~4.5 GB | Low quality, usable for testing. |
31
+ | `qwen3.5-9b-opus4.6-distill-Q4_K_M.gguf` | Q4_K_M | ~5.5 GB | **Recommended.** Best balance of quality and size. |
32
+ | `qwen3.5-9b-opus4.6-distill-Q5_K_M.gguf` | Q5_K_M | ~6.5 GB | High quality, moderate size. |
33
+ | `qwen3.5-9b-opus4.6-distill-Q6_K.gguf` | Q6_K | ~7.5 GB | Very high quality, near-lossless. |
34
+ | `qwen3.5-9b-opus4.6-distill-Q8_0.gguf` | Q8_0 | ~9.5 GB | Highest quality quantization. |
35
+ | `qwen3.5-9b-opus4.6-distill-f16.gguf` | F16 | ~18 GB | Full precision, no quantization loss. |
36
+
37
+ For most users, **Q4_K_M** or **Q5_K_M** is the sweet spot.
38
+
39
+ ## How to Use
40
+
41
+ ### llama.cpp
42
+
43
+ ```bash
44
+ llama-cli -m qwen3.5-9b-opus4.6-distill-Q5_K_M.gguf -p "<|im_start|>system\nYou are a deep reasoning AI. Think carefully inside <think> tags before answering.<|im_end|>\n<|im_start|>user\nExplain why the sky is blue.<|im_end|>\n<|im_start|>assistant\n" -n 2048
45
+ ```
46
+
47
+ ### Ollama
48
+
49
+ ```bash
50
+ ollama run empero-ai/qwen3.5-9b-opus4.6-distill
51
+ ```
52
+
53
+ ### LM Studio / GPT4All / Jan
54
+
55
+ Download the GGUF file of your choice and load it directly in the application.
56
+
57
+ ## Training Details
58
+
59
+ ### Method
60
+
61
+ - **Stage 1 — SFT (Supervised Fine-Tuning):** 3 epochs on ~13K examples teaching the model the `<think>` reasoning format using QLoRA (4-bit, rank 64, alpha 128)
62
+ - **Base model:** Qwen/Qwen3.5-9B
63
+ - **Hardware:** RTX 5090 (32GB VRAM)
64
+ - **Attention:** SDPA
65
+ - **Optimizer:** Paged AdamW 8-bit
66
+ - **Learning rate:** 1e-4 with cosine schedule
67
+ - **Effective batch size:** 8 (batch 1 × gradient accumulation 8)
68
+ - **Max sequence length:** 4096
69
+
70
+ ### SFT Results
71
+
72
+ | Metric | Epoch 1 | Epoch 2 (best) | Epoch 3 |
73
+ |--------|---------|-----------------|---------|
74
+ | Eval Loss | 0.5205 | **0.4809** | 0.4915 |
75
+ | Eval Token Accuracy | 0.8494 | **0.8615** | 0.8617 |
76
+ | Eval Entropy | 0.508 | 0.434 | 0.394 |
77
+
78
+ Best checkpoint (epoch 2) was selected via `load_best_model_at_end`.
79
+
80
+ ### Datasets
81
+
82
+ | Dataset | Examples | Type |
83
+ |---------|----------|------|
84
+ | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Problem → thinking → solution |
85
+ | [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) | 633 | ShareGPT with `<think>` tags |
86
+ | [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | 250 | Messages with `<think>` tags |
87
+ | [Roman1111111/claude-opus-4.6-10000x](https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x) | 9,631 | Messages with reasoning traces |
88
+ | **Total** | **12,840** | |
89
+
90
+ ### Output Format
91
+
92
+ The model outputs reasoning in `<think>` tags followed by its final answer:
93
+
94
+ ```
95
+ <think>
96
+ The user is asking about why the sky appears blue. This involves Rayleigh scattering...
97
+
98
+ Sunlight contains all wavelengths of visible light. When it enters Earth's atmosphere,
99
+ shorter wavelengths (blue/violet) scatter more than longer wavelengths (red/orange)...
100
+
101
+ While violet actually scatters more than blue, our eyes are more sensitive to blue light,
102
+ and some violet is absorbed by the upper atmosphere...
103
+ </think>
104
+
105
+ The sky appears blue due to Rayleigh scattering. When sunlight passes through Earth's
106
+ atmosphere, the shorter blue wavelengths scatter in all directions more than the longer
107
+ red wavelengths. Although violet light scatters even more, our eyes are more sensitive
108
+ to blue, and some violet is absorbed higher in the atmosphere — so we perceive the sky
109
+ as blue.
110
+ ```
111
+
112
+ ## License
113
+
114
+ This model inherits the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license from Qwen3.5-9B.