Update README.md
Browse files
README.md
CHANGED
|
@@ -20,3 +20,44 @@ datasets:
|
|
| 20 |
- Roman1111111/claude-opus-4.6-10000x
|
| 21 |
library_name: mlx
|
| 22 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
- Roman1111111/claude-opus-4.6-10000x
|
| 21 |
library_name: mlx
|
| 22 |
---
|
| 23 |
+
# Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit-MLX
|
| 24 |
+
|
| 25 |
+
A **5-bit MLX** quantization of [Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2).
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
## Quantization Details
|
| 29 |
+
|
| 30 |
+
| Property | Value |
|
| 31 |
+
|----------|-------|
|
| 32 |
+
| Method | 5-bit (5.501 bits per weight) |
|
| 33 |
+
| Tool | `mlx-lm 0.31.1` via `mlx-lm.convert` |
|
| 34 |
+
| Size | ~18.5GB |
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
## Performance
|
| 38 |
+
|
| 39 |
+
> Tested on Apple M1 Max, 32GB · macOS 15.7.5 · avg of 5 runs ~20k tokens generated each
|
| 40 |
+
|
| 41 |
+
| Metric | Engine | Model load time | Generation speed |
|
| 42 |
+
|--------|--------|--------|--------|
|
| 43 |
+
| MLX 5bit | `mlx-lm 0.31.1` | 2.47 seconds | 12.43 tokens/sec |
|
| 44 |
+
| GGUF Q4_K_M | `llama.cpp 2.8.0` | 1.23 seconds | 8.73 tokens/sec |
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
### Reproduce this quantization
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
mlx_lm.convert \
|
| 51 |
+
--hf-path Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 \
|
| 52 |
+
--mlx-path ./output \
|
| 53 |
+
--q \
|
| 54 |
+
--q-bits 5
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
## Credits
|
| 59 |
+
|
| 60 |
+
- [**Alibaba Qwen Team**](https://huggingface.co/Qwen) — [Qwen 3.5 27B](https://huggingface.co/Qwen/Qwen3.5-27B) dense model
|
| 61 |
+
- [**Jackrong**](https://huggingface.co/Jackrong) - Claude 4.6 Opus v2 distillation work
|
| 62 |
+
- [**Unsloth**](https://unsloth.ai/) - Training framework
|
| 63 |
+
- **Apple MLX Team** - High-speed local inference on Apple Silicon
|