groxaxo's picture
Update README.md
6e01896 verified
metadata
pipeline_tag: image-text-to-text
base_model: huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated
base_model_relation: quantized
library_name: llama.cpp
tags:
  - gguf
  - mxfp4
  - quantized
  - multimodal
  - abliterated
  - uncensored

Huihui Gemma 4 26B A4B GGUF banner

Huihui Gemma 4 26B A4B IT Abliterated — GGUF Quantizations

This repository contains GGUF / llama.cpp quantized builds of:

huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated

These are UD quantizations prepared for efficient local inference with llama.cpp, including support for multimodal image-text-to-text workflows when used with the corresponding mmproj file.

Overview

This release is designed for users who want to run the Huihui Gemma 4 26B A4B abliterated model locally with reduced VRAM and RAM requirements while preserving as much output quality as possible.

The quantization variants use an optimized tensor distribution strategy inspired by Unsloth-style mixed-quality quantization recipes, balancing model fidelity, speed, and memory efficiency across different hardware targets.

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred .gguf model file from this repository.
  3. For multimodal inference, also download the matching mmproj file.
  4. Run the model with llama.cpp using your preferred frontend or CLI.

Example:

./llama-cli \
  -m Huihui-Gemma-4-26B-A4B-it-abliterated-UD-Q4_K_XL.gguf \
  --mmproj mmproj-model.gguf \
  -p "Describe this image in detail."

Adjust the model filename and mmproj filename to match the files you downloaded.

Which Quant Should I Choose?

Choose based on your available memory and quality target:

  • Higher-bit / larger quants: Better quality, higher VRAM/RAM usage.
  • Mid-range quants: Best balance for most local setups.
  • Lower-bit quants: Faster and smaller, but with more quality loss.

For best results, use the largest quantization your hardware can comfortably run.

Multimodal Usage

This model supports image-text-to-text inference when used with the appropriate multimodal projection file.

Make sure the mmproj file matches this model family. Using an incorrect projection file may result in broken or degraded vision-language behavior.

Notes

  • This is a quantized GGUF release of the fine-tuned model.
  • Original model: huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated
  • Runtime target: llama.cpp
  • Format: GGUF
  • Modality: image-text-to-text
  • Quantization style: UD / mixed tensor distribution

Disclaimer

This repository only provides quantized GGUF builds. Model behavior, alignment characteristics, and training details are inherited from the original base model and fine-tune.