MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit
A 5-bit MLX quantization of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.
Quantization Details
| Property | Value |
|---|---|
| Method | 5-bit (5.501 bits per weight) |
| Tool | mlx-lm 0.31.1 via mlx-lm.convert |
| Size | ~18.5GB |
Performance
Tested on Apple M1 Max, 32GB · macOS 15.7.5 · avg of 5 runs ~20k tokens generated each
| Metric | Engine | Model load time | Generation speed |
|---|---|---|---|
| MLX 5bit | mlx-lm 0.31.1 |
2.47 seconds | 12.43 tokens/sec |
| GGUF Q4_K_M | llama.cpp 2.8.0 |
1.23 seconds | 8.73 tokens/sec |
Reproduce this quantization
mlx_lm.convert \
--hf-path Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 \
--mlx-path ./output \
--q \
--q-bits 5
Credits
- Alibaba Qwen Team — Qwen 3.5 27B dense model
- Jackrong - Claude 4.6 Opus v2 distillation work
- Unsloth - Training framework
- Apple MLX Team - High-speed local inference on Apple Silicon
- Downloads last month
- 989
Model size
27B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
5-bit
Model tree for matt-here/MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit
Base model
Qwen/Qwen3.5-27B