Qwen3.5 27B Opus Reasoning (MLX 4-bit)

This is an MLX 4-bit quantized version of the Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled model, optimized natively for Apple Silicon using the Asgard AI Platform (Heimdall) infrastructure.

📊 Heimdall Performance Benchmarks

The following benchmarks were recorded comprehensively via the Heimdall Engine (MLX Backend), ensuring zero external network interference.

💻 Hardware Configuration

Platform: Apple M4 Pro
RAM: 64GB Unified Memory
Memory Bandwidth: 273 GB/s
VRAM / Memory Footprint: ~15.0 - 16.2 GB (at 4-bit precision)

⚡ Speed and Latency Metrics

Task Type	Context (Max Tokens)	TTFT (Time to First Token)	TPS (Tokens per Second)
Short (Chat)	20	1.62s - 3.51s	~12.5 tok/s
Medium (RAG)	200	~13.20s	~15.1 tok/s
Long (Reasoning)	500	~32.65s	~15.3 tok/s

📈 Detailed Benchmark Report

A comprehensive, visualized HTML report containing latency variance, token utilization, and hardware efficiency graphs was generated alongside this run. You can view the full report locally at: ~/Developer/Heimdall/reports/benchmark_20260402_140737.html

⚙️ Usage within Asgard / Heimdall

Simply load this model using the standard Heimdall script:

cd ~/Developer/Heimdall
LLM_MODEL="$HOME/Developer/Heimdall/models/Qwen3.5-27B-Opus-Reasoning-MLX-4bit" ./scripts/start.sh

Downloads last month: 271

Safetensors

Model size

27B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

4-bit

Model tree for paripolt/Qwen3.5-27B-Opus-Reasoning-MLX-4bit

Base model

Qwen/Qwen3.5-27B

Finetuned

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Quantized

(34)

this model

paripolt
/

Qwen3.5-27B-Opus-Reasoning-MLX-4bit