MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit

A 5-bit MLX quantization of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.


Quantization Details

Property Value
Method 5-bit (5.501 bits per weight)
Tool mlx-lm 0.31.1 via mlx-lm.convert
Size ~18.5GB

Performance

Tested on Apple M1 Max, 32GB · macOS 15.7.5 · avg of 5 runs ~20k tokens generated each

Metric Engine Model load time Generation speed
MLX 5bit mlx-lm 0.31.1 2.47 seconds 12.43 tokens/sec
GGUF Q4_K_M llama.cpp 2.8.0 1.23 seconds 8.73 tokens/sec

Reproduce this quantization

mlx_lm.convert \
  --hf-path Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 \
  --mlx-path ./output \
  --q \
  --q-bits 5

Credits

Downloads last month
989
Safetensors
Model size
27B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for matt-here/MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit

Datasets used to train matt-here/MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit