Qwen3.5-397B-A17B-TurboQuant-MLX-6bit
6-bit MLX weight-quantized build of Qwen/Qwen3.5-397B-A17B (397B total / 17B active Sparse MoE, multimodal) prepared with TurboQuant randomized Hadamard rotations. Optimized for Apple Silicon via MLX.
6-bit hits a sweet spot between 8-bit quality and 4-bit compactness — ideal when you want headroom for long context on a 512 GB Mac without the last percentage points of fidelity loss seen at 4-bit.
Quickstart
from mlx_lm import load, generate
model, tokenizer = load("majentik/Qwen3.5-397B-A17B-TurboQuant-MLX-6bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Summarize the plot of Dune in 3 sentences."}],
add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True))
Model Specs
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-397B-A17B |
| Architecture | Sparse Mixture-of-Experts (MoE) |
| Total parameters | 397B |
| Active per token | 17B |
| Modalities | Image + Text → Text (image-text-to-text) |
| Context window | 256K tokens |
| Weight quantization | 6-bit MLX (TurboQuant pre-rotation) |
| Approx. disk footprint | ~300 GB |
| License | Apache 2.0 |
RotorQuant vs TurboQuant
| Aspect | TurboQuant (this repo) | RotorQuant |
|---|---|---|
| Rotation | Randomized Hadamard (static) | Learned orthogonal rotors (data-calibrated) |
| Calibration | Zero-shot | ~512 sample calibration pass |
| Accuracy @ 6-bit | ~99.6% of FP16 baseline | ~99.8% of FP16 baseline |
| Best for | Fastest turnaround | Highest fidelity at same bit-width |
Memory Estimates (6-bit MLX)
| Context | Active memory (approx.) |
|---|---|
| 8K | ~308 GB |
| 32K | ~318 GB |
| 128K | ~348 GB |
| 256K | ~378 GB |
Hardware Requirements
- Minimum: Apple Silicon with 384 GB+ unified memory (tight)
- Recommended: 512 GB unified memory Mac Studio
- Does not fit on 96 GB / 128 GB / 192 GB / 256 GB Macs
See Also
- TurboQuant MLX: 8-bit · 5-bit · 4-bit · 2-bit
- RotorQuant MLX 6-bit: majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-6bit
- Base model: Qwen/Qwen3.5-397B-A17B
- Downloads last month
- 3
Model size
87B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
6-bit
Model tree for majentik/Qwen3.5-397B-A17B-TurboQuant-MLX-6bit
Base model
Qwen/Qwen3.5-397B-A17B