Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -20,3 +20,44 @@ datasets:
 - Roman1111111/claude-opus-4.6-10000x
 library_name: mlx
 ---

 - Roman1111111/claude-opus-4.6-10000x
 library_name: mlx
 ---
+# Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit-MLX
+A **5-bit MLX** quantization of [Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2).
+---
+## Quantization Details
+| Property | Value |
+|----------|-------|
+| Method | 5-bit (5.501 bits per weight) |
+| Tool | `mlx-lm 0.31.1` via `mlx-lm.convert` |
+| Size | ~18.5GB |
+---
+## Performance
+> Tested on Apple M1 Max, 32GB · macOS 15.7.5 · avg of 5 runs ~20k tokens generated each
+| Metric | Engine | Model load time | Generation speed |
+|--------|--------|--------|--------|
+| MLX 5bit | `mlx-lm 0.31.1` | 2.47 seconds | 12.43 tokens/sec |
+| GGUF Q4_K_M | `llama.cpp 2.8.0` | 1.23 seconds | 8.73 tokens/sec |
+---
+### Reproduce this quantization
+```bash
+mlx_lm.convert \
+  --hf-path Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 \
+  --mlx-path ./output \
+  --q \
+  --q-bits 5
+```
+---
+## Credits
+- [**Alibaba Qwen Team**](https://huggingface.co/Qwen) — [Qwen 3.5 27B](https://huggingface.co/Qwen/Qwen3.5-27B) dense model
+- [**Jackrong**](https://huggingface.co/Jackrong) - Claude 4.6 Opus v2 distillation work
+- [**Unsloth**](https://unsloth.ai/) - Training framework
+- **Apple MLX Team** - High-speed local inference on Apple Silicon