Qwopus3.6-27B-v2-oQ4-fp16-mtp

This repository contains an oMLX/oQ 4-bit quantized MLX version of Jackrong/Qwopus3.6-27B-v2.

This variant uses float16 / fp16 for non-quantized weights and preserves MTP weights.

Model lineage

  • Original model: Jackrong/Qwopus3.6-27B-v2
  • Quantized model: AbarthJoe/Qwopus3.6-27B-v2-oQ4-fp16-mtp
  • Quantization format: MLX / oMLX oQ
  • Relationship: Quantized derivative of the original model

Quantization details

  • Quantization tool: oMLX / oQ
  • Quantization level: oQ4
  • Preserve MTP weights: Yes
  • Non-quant weight dtype: float16 / fp16
  • Output format: MLX
  • Target platform: Apple Silicon

About this fp16 variant

This version keeps non-quantized weights in float16 instead of the default bfloat16 path.

The goal is to test whether fp16 improves prefill / prompt processing speed on Apple Silicon while keeping oQ4 generation speed high.

Compared with the default oQ4 MTP version:

  • May provide faster prefill in some environments
  • May be less numerically stable than bfloat16
  • Should be tested carefully with long-context prompts
  • May behave differently depending on Apple Silicon generation and MLX/oMLX version

If this fp16 variant shows unstable output, repeated text, degraded reasoning, or unusual long-context behavior, use the default oQ4-mtp version instead.

Expected use case

This is an experimental speed-focused fp16 variant.

It is mainly useful for users who want to benchmark:

  • oQ4 generation speed
  • fp16 non-quant weight behavior
  • MTP-preserved inference
  • Apple Silicon local model performance

Benchmark

Tested on MacBook Pro M3 Max 40-core GPU.

Model Context Prompt processing Token generation
oQ4 fp16 + MTP 1k 218.2 tok/s 19.7 tok/s
oQ4 fp16 + MTP 4k 210.9 tok/s 20.9 tok/s

Benchmark results may vary depending on hardware, software version, prompt type, context length, and runtime settings.

Related models

Other oMLX/oQ quantized versions are available in this collection:

Qwopus oMLX oQ Quantized Models for Apple Silicon

Credits

Original model by Jackrong.

This quantized MLX/oQ fp16 variant was created and uploaded by AbarthJoe.

License

The original model is licensed under Apache-2.0.
This quantized version follows the same Apache-2.0 license where applicable.

Disclaimer

This is a community quantized model for research, experimentation, and local inference testing.

It has not been fully safety-evaluated or benchmarked across all tasks.
Please test carefully before using it for production, sensitive, or high-stakes use cases.

Downloads last month
537
Safetensors
Model size
5B params
Tensor type
F16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AbarthJoe/Qwopus3.6-27B-v2-oQ4-fp16-mtp

Quantized
(48)
this model

Collection including AbarthJoe/Qwopus3.6-27B-v2-oQ4-fp16-mtp