Qwopus3.6-27B-v2-oQ4-fp16-mtp

This repository contains an oMLX/oQ 4-bit quantized MLX version of Jackrong/Qwopus3.6-27B-v2.

This variant uses float16 / fp16 for non-quantized weights and preserves MTP weights.

Model lineage

Original model: Jackrong/Qwopus3.6-27B-v2
Quantized model: AbarthJoe/Qwopus3.6-27B-v2-oQ4-fp16-mtp
Quantization format: MLX / oMLX oQ
Relationship: Quantized derivative of the original model

Quantization details

Quantization tool: oMLX / oQ
Quantization level: oQ4
Preserve MTP weights: Yes
Non-quant weight dtype: float16 / fp16
Output format: MLX
Target platform: Apple Silicon

About this fp16 variant

This version keeps non-quantized weights in float16 instead of the default bfloat16 path.

The goal is to test whether fp16 improves prefill / prompt processing speed on Apple Silicon while keeping oQ4 generation speed high.

Compared with the default oQ4 MTP version:

May provide faster prefill in some environments
May be less numerically stable than bfloat16
Should be tested carefully with long-context prompts
May behave differently depending on Apple Silicon generation and MLX/oMLX version

If this fp16 variant shows unstable output, repeated text, degraded reasoning, or unusual long-context behavior, use the default oQ4-mtp version instead.

Expected use case

This is an experimental speed-focused fp16 variant.

It is mainly useful for users who want to benchmark:

oQ4 generation speed
fp16 non-quant weight behavior
MTP-preserved inference
Apple Silicon local model performance

Benchmark

Tested on MacBook Pro M3 Max 40-core GPU.

Model	Context	Prompt processing	Token generation
oQ4 fp16 + MTP	1k	218.2 tok/s	19.7 tok/s
oQ4 fp16 + MTP	4k	210.9 tok/s	20.9 tok/s

Benchmark results may vary depending on hardware, software version, prompt type, context length, and runtime settings.

Related models

Other oMLX/oQ quantized versions are available in this collection:

Qwopus oMLX oQ Quantized Models for Apple Silicon

Credits

Original model by Jackrong.

This quantized MLX/oQ fp16 variant was created and uploaded by AbarthJoe.

License

The original model is licensed under Apache-2.0.
This quantized version follows the same Apache-2.0 license where applicable.

Disclaimer

This is a community quantized model for research, experimentation, and local inference testing.

It has not been fully safety-evaluated or benchmarked across all tasks.
Please test carefully before using it for production, sensitive, or high-stakes use cases.

Downloads last month: 537

Safetensors

Model size

5B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbarthJoe/Qwopus3.6-27B-v2-oQ4-fp16-mtp

Base model

Jackrong/Qwopus3.6-27B-v2

Quantized

(48)

this model

Collection including AbarthJoe/Qwopus3.6-27B-v2-oQ4-fp16-mtp

🍎 Qwopus oMLX oQ Quantized Models for Apple Silicon

Collection

3 items • Updated 12 days ago