File size: 2,989 Bytes
df72a1f 44c9ad8 25622c5 4c6028b 6e01896 0c049a1 6e01896 0c049a1 6e01896 4c6028b 44c9ad8 6e01896 44c9ad8 6e01896 4c6028b 6e01896 4c6028b 25622c5 6e01896 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | ---
pipeline_tag: image-text-to-text
base_model: huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated
base_model_relation: quantized
library_name: llama.cpp
tags:
- gguf
- mxfp4
- quantized
- multimodal
- abliterated
- uncensored
---
<p align="center">
<img src="./banner1.png" alt="Huihui Gemma 4 26B A4B GGUF banner" width="100%">
</p>
# Huihui Gemma 4 26B A4B IT Abliterated — GGUF Quantizations
This repository contains **GGUF / llama.cpp quantized builds** of:
[huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated)
These are **UD quantizations** prepared for efficient local inference with **llama.cpp**, including support for **multimodal image-text-to-text workflows** when used with the corresponding `mmproj` file.
## Overview
This release is designed for users who want to run the Huihui Gemma 4 26B A4B abliterated model locally with reduced VRAM and RAM requirements while preserving as much output quality as possible.
The quantization variants use an optimized tensor distribution strategy inspired by **Unsloth-style mixed-quality quantization recipes**, balancing model fidelity, speed, and memory efficiency across different hardware targets.
## Quick Start
1. Download the latest release of **llama.cpp**.
2. Download your preferred `.gguf` model file from this repository.
3. For multimodal inference, also download the matching `mmproj` file.
4. Run the model with llama.cpp using your preferred frontend or CLI.
Example:
```bash
./llama-cli \
-m Huihui-Gemma-4-26B-A4B-it-abliterated-UD-Q4_K_XL.gguf \
--mmproj mmproj-model.gguf \
-p "Describe this image in detail."
````
Adjust the model filename and `mmproj` filename to match the files you downloaded.
## Which Quant Should I Choose?
Choose based on your available memory and quality target:
* **Higher-bit / larger quants**: Better quality, higher VRAM/RAM usage.
* **Mid-range quants**: Best balance for most local setups.
* **Lower-bit quants**: Faster and smaller, but with more quality loss.
For best results, use the largest quantization your hardware can comfortably run.
## Multimodal Usage
This model supports **image-text-to-text** inference when used with the appropriate multimodal projection file.
Make sure the `mmproj` file matches this model family. Using an incorrect projection file may result in broken or degraded vision-language behavior.
## Notes
* This is a **quantized GGUF release** of the fine-tuned model.
* Original model: [huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated)
* Runtime target: **llama.cpp**
* Format: **GGUF**
* Modality: **image-text-to-text**
* Quantization style: **UD / mixed tensor distribution**
## Disclaimer
This repository only provides quantized GGUF builds. Model behavior, alignment characteristics, and training details are inherited from the original base model and fine-tune.
|