Qwen3.5-397B-A17B-RotorQuant-MLX-4bit
4-bit MLX weight-quantized build of Qwen/Qwen3.5-397B-A17B (397B total / 17B active Sparse MoE, multimodal) prepared with RotorQuant learned orthogonal rotors. Optimized for Apple Silicon via MLX.
4-bit RotorQuant is our recommended default for 256 GB Mac Studios: highest fidelity attainable at 4-bit while preserving most of the long-context reasoning capability of the FP16 original.
Quickstart
from mlx_lm import load, generate
model, tokenizer = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-4bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Draft a short release note for a new MoE feature."}],
add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True))
Multimodal via mlx-vlm:
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
model, processor = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-4bit")
prompt = apply_chat_template(processor, config=model.config,
prompt="What does this UI screenshot show?", num_images=1)
print(generate(model, processor, prompt, image=["./screenshot.png"], max_tokens=512))
Model Specs
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-397B-A17B |
| Architecture | Sparse Mixture-of-Experts (MoE) |
| Total parameters | 397B |
| Active per token | 17B |
| Modalities | Image + Text → Text (image-text-to-text) |
| Context window | 256K tokens |
| Weight quantization | 4-bit MLX (RotorQuant learned rotors) |
| Approx. disk footprint | ~220 GB |
| License | Apache 2.0 |
RotorQuant vs TurboQuant
| Aspect | RotorQuant (this repo) | TurboQuant |
|---|---|---|
| Rotation | Learned orthogonal rotors (data-calibrated) | Randomized Hadamard (static) |
| Calibration | ~512 sample calibration pass | Zero-shot |
| Accuracy @ 4-bit | ~99.1% of FP16 baseline | ~98.6% of FP16 baseline |
| Best for | Highest fidelity at same bit-width | Fastest turnaround |
Memory Estimates (4-bit MLX)
| Context | Active memory (approx.) |
|---|---|
| 8K | ~228 GB |
| 32K | ~238 GB |
| 128K | ~268 GB |
| 256K | ~298 GB |
Hardware Requirements
- Minimum: Apple Silicon with 256 GB unified memory for short/medium contexts
- Recommended: 384 GB+ unified memory for full 256K context
- Does not fit on 96 GB / 128 GB / 192 GB Macs — use the 2-bit variant or a smaller model
See Also
- RotorQuant MLX: 8-bit · 6-bit · 5-bit · 2-bit
- TurboQuant MLX 4-bit: majentik/Qwen3.5-397B-A17B-TurboQuant-MLX-4bit
- KV-cache wrapper: majentik/Qwen3.5-397B-A17B-RotorQuant
- Base model: Qwen/Qwen3.5-397B-A17B
- Downloads last month
- 11
Model size
62B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-4bit
Base model
Qwen/Qwen3.5-397B-A17B