lordx64 commited on
Commit
be85f12
·
verified ·
1 Parent(s): 7773ce4

init: GGUF quant repo for distilled Qwen3.6-35B-A3B

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled
3
+ library_name: gguf
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - gguf
7
+ - llama.cpp
8
+ - lmstudio
9
+ - reasoning
10
+ - chain-of-thought
11
+ - qwen
12
+ - qwen3.6
13
+ - moe
14
+ - distillation
15
+ quantized_by: lordx64
16
+ license: apache-2.0
17
+ ---
18
+
19
+ # Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled-GGUF
20
+
21
+ GGUF quantizations of [`lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled`](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled) for
22
+ use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and
23
+ [LM Studio](https://lmstudio.ai/).
24
+
25
+ The base model is a reasoning-distilled variant of Qwen3.6-35B-A3B fine-tuned
26
+ to imitate the chain-of-thought style of Claude Opus 4.7. It thinks in explicit
27
+ `<think>...</think>` blocks before producing the final answer.
28
+
29
+ ## Quant files
30
+
31
+ See the file list for all available quant levels. Common choices:
32
+
33
+ | File | Quant | Approx size | Use case |
34
+ |---|---|---|---|
35
+ | `*.IQ4_XS.gguf` | IQ4_XS | ~18 GB | Smallest quant with good quality — default pick for LM Studio |
36
+ | `*.Q4_K_M.gguf` | Q4_K_M | ~21 GB | Balanced quality / size |
37
+ | `*.Q5_K_M.gguf` | Q5_K_M | ~25 GB | Higher quality |
38
+ | `*.Q8_0.gguf` | Q8_0 | ~35 GB | Near-lossless |
39
+
40
+ ## Running in llama.cpp
41
+
42
+ ```bash
43
+ llama-server \
44
+ -m Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled.IQ4_XS.gguf \
45
+ --host 127.0.0.1 --port 18081 \
46
+ -c 32768 -fa on \
47
+ --cache-type-k q8_0 --cache-type-v turbo4
48
+ ```
49
+
50
+ ## Running in LM Studio
51
+
52
+ Search for `lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled-GGUF` inside LM Studio's model browser and pick the quant
53
+ that fits your RAM/VRAM. The model should appear automatically once HF indexes
54
+ this repo.
55
+
56
+ ## License
57
+
58
+ Apache 2.0, inherited from the base model. See
59
+ [`lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled`](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled) for training details,
60
+ evaluations, and intended use.