DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3

EXL3 (ExLlamaV3) quantizations of llmfan46/gemma-3-12b-it-ultra-uncensored-heretic. All credit for the original model goes to the original authors.

πŸ“Š Available Quantizations & VRAM

The model weights are stored in separate branches. Please switch to a branch to download. Note: VRAM estimates include PyTorch context overhead (~0.8GB) and assume an unquantized FP16 KV cache.

Target BPW Head BPW Branch (Download Link) WikiText-2 PPL (512 ctx)ΒΉ 2K ctx 4K ctx 8K ctx 16K ctx 32K ctx
4.0 h6 4.0bpw_h6 9.7746 ~9.93 GB ~10.68 GB ~12.18 GB ~15.18 GB ~21.18 GB
5.0 h6 5.0bpw_h6 9.7767 ~11.19 GB ~11.94 GB ~13.44 GB ~16.44 GB ~22.44 GB
6.0 h6 6.0bpw_h6 9.7294 ~12.44 GB ~13.19 GB ~14.69 GB ~17.69 GB ~23.69 GB
8.0 h8 8.0bpw_h8 9.7336 ~15.18 GB ~15.93 GB ~17.43 GB ~20.43 GB ~26.43 GB

ΒΉ Evaluated against WikiText-2 with ExLlamaV3 using a strided 512-token context window (-c 512) in llama.cpp parity mode (-g). Lower is better. (Higher BPW = higher quality, lower BPW = fits in less VRAM).

πŸ“₯ How to Download

It's recommended to use the huggingface-cli to download specific branches. (Do not use git clone as it will download all branches!)

Ensure you have the CLI installed:

pip install -U "huggingface_hub[cli]"

Download a specific branch (e.g., 4.0bpw_h6):

# Example: Downloading the 4.0bpw_h6 branch
huggingface-cli download DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3 --revision 4.0bpw_h6 --local-dir gemma-3-12b-it-ultra-uncensored-heretic-exl3-4.0bpw_h6

πŸ’» Supported Engines

These models are highly optimized for modern GPUs and can be run using:

  • TabbyAPI: A fast, OpenAI-compatible API server. (Set model_name: "gemma-3-12b-it-ultra-uncensored-heretic-exl3-<BranchName>" in your config)
  • Text-Generation-WebUI: A local web interface. (Select the exllamav3 loader)
  • ExLlamaV3 (Native): Python library for custom integration.

πŸ“ˆ Perplexity Degradation Curve

(Lower is better) Perplexity Graph

βš™οΈ Advanced: Quantization Environment & Settings

πŸ”¬ Quantization Settings

  • Codebook: mcg

  • Output Scales: always

  • Calibration Rows: 250

  • Calibration Cols: 2048

  • Calibration Dataset: ExLlamaV3 Default (Wiki/C4/Code)

  • High Quality (HQ) Mode: False

  • ExLlamaV3: 0.0.29 (Commit: cb1a436)

  • Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3