matt-here commited on
Commit
91a17ca
·
verified ·
1 Parent(s): a5accc1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -20,3 +20,44 @@ datasets:
20
  - Roman1111111/claude-opus-4.6-10000x
21
  library_name: mlx
22
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - Roman1111111/claude-opus-4.6-10000x
21
  library_name: mlx
22
  ---
23
+ # Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit-MLX
24
+
25
+ A **5-bit MLX** quantization of [Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2).
26
+
27
+ ---
28
+ ## Quantization Details
29
+
30
+ | Property | Value |
31
+ |----------|-------|
32
+ | Method | 5-bit (5.501 bits per weight) |
33
+ | Tool | `mlx-lm 0.31.1` via `mlx-lm.convert` |
34
+ | Size | ~18.5GB |
35
+
36
+ ---
37
+ ## Performance
38
+
39
+ > Tested on Apple M1 Max, 32GB · macOS 15.7.5 · avg of 5 runs ~20k tokens generated each
40
+
41
+ | Metric | Engine | Model load time | Generation speed |
42
+ |--------|--------|--------|--------|
43
+ | MLX 5bit | `mlx-lm 0.31.1` | 2.47 seconds | 12.43 tokens/sec |
44
+ | GGUF Q4_K_M | `llama.cpp 2.8.0` | 1.23 seconds | 8.73 tokens/sec |
45
+
46
+ ---
47
+ ### Reproduce this quantization
48
+
49
+ ```bash
50
+ mlx_lm.convert \
51
+ --hf-path Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 \
52
+ --mlx-path ./output \
53
+ --q \
54
+ --q-bits 5
55
+ ```
56
+
57
+ ---
58
+ ## Credits
59
+
60
+ - [**Alibaba Qwen Team**](https://huggingface.co/Qwen) — [Qwen 3.5 27B](https://huggingface.co/Qwen/Qwen3.5-27B) dense model
61
+ - [**Jackrong**](https://huggingface.co/Jackrong) - Claude 4.6 Opus v2 distillation work
62
+ - [**Unsloth**](https://unsloth.ai/) - Training framework
63
+ - **Apple MLX Team** - High-speed local inference on Apple Silicon