Qwen3.5 27B Opus Reasoning (MLX 4-bit)

This is an MLX 4-bit quantized version of the Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled model, optimized natively for Apple Silicon using the Asgard AI Platform (Heimdall) infrastructure.

📊 Heimdall Performance Benchmarks

The following benchmarks were recorded comprehensively via the Heimdall Engine (MLX Backend), ensuring zero external network interference.

💻 Hardware Configuration

  • Platform: Apple M4 Pro
  • RAM: 64GB Unified Memory
  • Memory Bandwidth: 273 GB/s
  • VRAM / Memory Footprint: ~15.0 - 16.2 GB (at 4-bit precision)

⚡ Speed and Latency Metrics

Task Type Context (Max Tokens) TTFT (Time to First Token) TPS (Tokens per Second)
Short (Chat) 20 1.62s - 3.51s ~12.5 tok/s
Medium (RAG) 200 ~13.20s ~15.1 tok/s
Long (Reasoning) 500 ~32.65s ~15.3 tok/s

📈 Detailed Benchmark Report

A comprehensive, visualized HTML report containing latency variance, token utilization, and hardware efficiency graphs was generated alongside this run. You can view the full report locally at: ~/Developer/Heimdall/reports/benchmark_20260402_140737.html

⚙️ Usage within Asgard / Heimdall

Simply load this model using the standard Heimdall script:

cd ~/Developer/Heimdall
LLM_MODEL="$HOME/Developer/Heimdall/models/Qwen3.5-27B-Opus-Reasoning-MLX-4bit" ./scripts/start.sh
Downloads last month
271
Safetensors
Model size
27B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for paripolt/Qwen3.5-27B-Opus-Reasoning-MLX-4bit

Datasets used to train paripolt/Qwen3.5-27B-Opus-Reasoning-MLX-4bit