Qwen3.5-397B-A17B-TurboQuant-MLX-6bit

6-bit MLX weight-quantized build of Qwen/Qwen3.5-397B-A17B (397B total / 17B active Sparse MoE, multimodal) prepared with TurboQuant randomized Hadamard rotations. Optimized for Apple Silicon via MLX.

6-bit hits a sweet spot between 8-bit quality and 4-bit compactness — ideal when you want headroom for long context on a 512 GB Mac without the last percentage points of fidelity loss seen at 4-bit.

Quickstart

from mlx_lm import load, generate

model, tokenizer = load("majentik/Qwen3.5-397B-A17B-TurboQuant-MLX-6bit")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Summarize the plot of Dune in 3 sentences."}],
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True))

Model Specs

Property Value
Base model Qwen/Qwen3.5-397B-A17B
Architecture Sparse Mixture-of-Experts (MoE)
Total parameters 397B
Active per token 17B
Modalities Image + Text → Text (image-text-to-text)
Context window 256K tokens
Weight quantization 6-bit MLX (TurboQuant pre-rotation)
Approx. disk footprint ~300 GB
License Apache 2.0

RotorQuant vs TurboQuant

Aspect TurboQuant (this repo) RotorQuant
Rotation Randomized Hadamard (static) Learned orthogonal rotors (data-calibrated)
Calibration Zero-shot ~512 sample calibration pass
Accuracy @ 6-bit ~99.6% of FP16 baseline ~99.8% of FP16 baseline
Best for Fastest turnaround Highest fidelity at same bit-width

Memory Estimates (6-bit MLX)

Context Active memory (approx.)
8K ~308 GB
32K ~318 GB
128K ~348 GB
256K ~378 GB

Hardware Requirements

  • Minimum: Apple Silicon with 384 GB+ unified memory (tight)
  • Recommended: 512 GB unified memory Mac Studio
  • Does not fit on 96 GB / 128 GB / 192 GB / 256 GB Macs

See Also

Downloads last month
3
Safetensors
Model size
87B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Qwen3.5-397B-A17B-TurboQuant-MLX-6bit

Quantized
(73)
this model