MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit

A 5-bit MLX quantization of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.

Quantization Details

Property	Value
Method	5-bit (5.501 bits per weight)
Tool	`mlx-lm 0.31.1` via `mlx-lm.convert`
Size	~18.5GB

Performance

Tested on Apple M1 Max, 32GB · macOS 15.7.5 · avg of 5 runs ~20k tokens generated each

Metric	Engine	Model load time	Generation speed
MLX 5bit	`mlx-lm 0.31.1`	2.47 seconds	12.43 tokens/sec
GGUF Q4_K_M	`llama.cpp 2.8.0`	1.23 seconds	8.73 tokens/sec

Reproduce this quantization

mlx_lm.convert \
  --hf-path Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 \
  --mlx-path ./output \
  --q \
  --q-bits 5

Credits

Alibaba Qwen Team — Qwen 3.5 27B dense model
Jackrong - Claude 4.6 Opus v2 distillation work
Unsloth - Training framework
Apple MLX Team - High-speed local inference on Apple Silicon

Downloads last month: 989

Safetensors

Model size

27B params

Tensor type

BF16

·

U32

·

F32

·

MLX

Hardware compatibility

Log In to add your hardware

5-bit

Model tree for matt-here/MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit

Base model

Qwen/Qwen3.5-27B

Adapter

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2

Adapter

(8)

this model

Datasets used to train matt-here/MLX-Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-5bit