These are MXFP4 quantizations of the model gemma-4-26B-A4B-it
Quick Start
- Download the latest release of llama.cpp.
- Download your preferred model variant from below.
- For the
mmprojfile, it is recommended to use the F32 version for the best visual processing results. F32 > BF16 > F16
Which version should I choose?
All variants use MXFP4 for the MoE (Mixture of Experts) weights to keep the model efficient. The difference lies in how the remaining tensors are handled:
| Variant | Quality | Performance | Recommendation |
|---|---|---|---|
| BF16 | ⭐⭐⭐ (Highest) | Variable* | Best for maximum accuracy; original unquantized weights. |
| F16 | ⭐⭐ (High) | Fast | Great alternative if BF16 is slow on your hardware. |
| Q8 | ⭐ (Standard) | Fastest | Balanced performance and memory usage. |
*Note: On some older architectures, BF16 may be slower than F16.
As per danielhanchen, I also updated the models. Please re-download them.
- Downloads last month
- 18,109
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for noctrex/gemma-4-26B-A4B-it-MXFP4_MOE-GGUF
Base model
google/gemma-4-26B-A4B-it