Qwen3.5 27B Opus Reasoning (MLX 4-bit)
This is an MLX 4-bit quantized version of the Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled model, optimized natively for Apple Silicon using the Asgard AI Platform (Heimdall) infrastructure.
📊 Heimdall Performance Benchmarks
The following benchmarks were recorded comprehensively via the Heimdall Engine (MLX Backend), ensuring zero external network interference.
💻 Hardware Configuration
- Platform: Apple M4 Pro
- RAM: 64GB Unified Memory
- Memory Bandwidth: 273 GB/s
- VRAM / Memory Footprint: ~15.0 - 16.2 GB (at 4-bit precision)
⚡ Speed and Latency Metrics
| Task Type | Context (Max Tokens) | TTFT (Time to First Token) | TPS (Tokens per Second) |
|---|---|---|---|
| Short (Chat) | 20 | 1.62s - 3.51s | ~12.5 tok/s |
| Medium (RAG) | 200 | ~13.20s | ~15.1 tok/s |
| Long (Reasoning) | 500 | ~32.65s | ~15.3 tok/s |
📈 Detailed Benchmark Report
A comprehensive, visualized HTML report containing latency variance, token utilization, and hardware efficiency graphs was generated alongside this run. You can view the full report locally at:
~/Developer/Heimdall/reports/benchmark_20260402_140737.html
⚙️ Usage within Asgard / Heimdall
Simply load this model using the standard Heimdall script:
cd ~/Developer/Heimdall
LLM_MODEL="$HOME/Developer/Heimdall/models/Qwen3.5-27B-Opus-Reasoning-MLX-4bit" ./scripts/start.sh
- Downloads last month
- 271
Model size
27B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for paripolt/Qwen3.5-27B-Opus-Reasoning-MLX-4bit
Base model
Qwen/Qwen3.5-27B