File size: 2,989 Bytes
df72a1f
44c9ad8
25622c5
 
 
 
 
 
 
 
 
 
4c6028b
6e01896
0c049a1
6e01896
0c049a1
6e01896
 
 
 
 
 
 
 
 
 
 
 
 
 
4c6028b
44c9ad8
6e01896
44c9ad8
6e01896
 
 
 
 
 
 
 
 
 
 
 
4c6028b
6e01896
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c6028b
25622c5
6e01896
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
pipeline_tag: image-text-to-text
base_model: huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated
base_model_relation: quantized
library_name: llama.cpp
tags:
  - gguf
  - mxfp4
  - quantized
  - multimodal
  - abliterated
  - uncensored
---

<p align="center">
  <img src="./banner1.png" alt="Huihui Gemma 4 26B A4B GGUF banner" width="100%">
</p>

# Huihui Gemma 4 26B A4B IT Abliterated — GGUF Quantizations

This repository contains **GGUF / llama.cpp quantized builds** of:

[huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated)

These are **UD quantizations** prepared for efficient local inference with **llama.cpp**, including support for **multimodal image-text-to-text workflows** when used with the corresponding `mmproj` file.

## Overview

This release is designed for users who want to run the Huihui Gemma 4 26B A4B abliterated model locally with reduced VRAM and RAM requirements while preserving as much output quality as possible.

The quantization variants use an optimized tensor distribution strategy inspired by **Unsloth-style mixed-quality quantization recipes**, balancing model fidelity, speed, and memory efficiency across different hardware targets.

## Quick Start

1. Download the latest release of **llama.cpp**.
2. Download your preferred `.gguf` model file from this repository.
3. For multimodal inference, also download the matching `mmproj` file.
4. Run the model with llama.cpp using your preferred frontend or CLI.

Example:

```bash
./llama-cli \
  -m Huihui-Gemma-4-26B-A4B-it-abliterated-UD-Q4_K_XL.gguf \
  --mmproj mmproj-model.gguf \
  -p "Describe this image in detail."
````

Adjust the model filename and `mmproj` filename to match the files you downloaded.

## Which Quant Should I Choose?

Choose based on your available memory and quality target:

* **Higher-bit / larger quants**: Better quality, higher VRAM/RAM usage.
* **Mid-range quants**: Best balance for most local setups.
* **Lower-bit quants**: Faster and smaller, but with more quality loss.

For best results, use the largest quantization your hardware can comfortably run.

## Multimodal Usage

This model supports **image-text-to-text** inference when used with the appropriate multimodal projection file.

Make sure the `mmproj` file matches this model family. Using an incorrect projection file may result in broken or degraded vision-language behavior.

## Notes

* This is a **quantized GGUF release** of the fine-tuned model.
* Original model: [huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated](https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated)
* Runtime target: **llama.cpp**
* Format: **GGUF**
* Modality: **image-text-to-text**
* Quantization style: **UD / mixed tensor distribution**

## Disclaimer

This repository only provides quantized GGUF builds. Model behavior, alignment characteristics, and training details are inherited from the original base model and fine-tune.