Qwen3.5-9B MTPLX Optimized Speed FP16

FP16 compatibility sibling for Youssofal/Qwen3.5-9B-MTPLX-Optimized-Speed, packaged for MTPLX native Multi-Token-Prediction speculative decoding on older Apple Silicon.

This variant keeps the same release model family as Qwen3.5-9B Optimized Speed. Packed quantized tensors stay packed; BF16 floating tensors are converted to FP16 so M1 and M2 Macs can use the FP16-friendly path without changing the artifact's intended speed/quality tier.

Run It

brew install youssofal/mtplx/mtplx
mtplx start
mtplx run "hello" --model Youssofal/Qwen3.5-9B-MTPLX-Optimized-Speed-FP16

For an OpenAI-compatible local server:

mtplx serve --model Youssofal/Qwen3.5-9B-MTPLX-Optimized-Speed-FP16 --profile sustained --max --port 8000 --no-stats-footer

Device Routing

  • M1/M2 Apple Silicon: MTPLX may prefer this FP16 sibling.
  • M3/M4/M5 Apple Silicon: MTPLX keeps the normal optimized artifact by default.
  • Explicit --model always wins.

Recommended Runtime Defaults

Setting Value
Backend qwen3-next-mtp
Default depth D2
Profile sustained
Precision policy preserve packed tensors; convert BF16 floats to FP16

Source Performance Baseline

These are the source artifact numbers used as the regression baseline. The FP16 variant should stay close, but it is primarily a device-compatibility release.

Mode TPS Verify time Acceptance
AR baseline 64.96 - -
D1 comparison 92.87 6.83s 0.9120
D2 promoted default 101.32 4.13s 0.9398, 0.8102
D3 comparison 96.30 4.58s 0.9278, 0.7732, 0.6443

Model Build

Component Format
Main body 6-bit MLX affine body with BF16 float leaves converted to FP16
MTP sidecar same MTP policy as source; BF16 float leaves converted to FP16
Packed quantized tensors preserved without requantization
Manifest MTPLX_FP16_CONVERSION_MANIFEST.json records tensor-level conversions

This is not a full-precision checkpoint. It is built for fast local use on Apple Silicon through MTPLX.

Downloads last month
94
Safetensors
Model size
9B params
Tensor type
F16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support