DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3
EXL3 (ExLlamaV3) quantizations of llmfan46/gemma-3-12b-it-ultra-uncensored-heretic. All credit for the original model goes to the original authors.
π Available Quantizations & VRAM
The model weights are stored in separate branches. Please switch to a branch to download. Note: VRAM estimates include PyTorch context overhead (~0.8GB) and assume an unquantized FP16 KV cache.
| Target BPW | Head BPW | Branch (Download Link) | WikiText-2 PPL (512 ctx)ΒΉ | 2K ctx | 4K ctx | 8K ctx | 16K ctx | 32K ctx |
|---|---|---|---|---|---|---|---|---|
| 4.0 | h6 | 4.0bpw_h6 | 9.7746 | ~9.93 GB | ~10.68 GB | ~12.18 GB | ~15.18 GB | ~21.18 GB |
| 5.0 | h6 | 5.0bpw_h6 | 9.7767 | ~11.19 GB | ~11.94 GB | ~13.44 GB | ~16.44 GB | ~22.44 GB |
| 6.0 | h6 | 6.0bpw_h6 | 9.7294 | ~12.44 GB | ~13.19 GB | ~14.69 GB | ~17.69 GB | ~23.69 GB |
| 8.0 | h8 | 8.0bpw_h8 | 9.7336 | ~15.18 GB | ~15.93 GB | ~17.43 GB | ~20.43 GB | ~26.43 GB |
ΒΉ Evaluated against WikiText-2 with ExLlamaV3 using a strided 512-token context window (-c 512) in llama.cpp parity mode (-g). Lower is better. (Higher BPW = higher quality, lower BPW = fits in less VRAM).
π₯ How to Download
It's recommended to use the huggingface-cli to download specific branches. (Do not use git clone as it will download all branches!)
Ensure you have the CLI installed:
pip install -U "huggingface_hub[cli]"
Download a specific branch (e.g., 4.0bpw_h6):
# Example: Downloading the 4.0bpw_h6 branch
huggingface-cli download DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3 --revision 4.0bpw_h6 --local-dir gemma-3-12b-it-ultra-uncensored-heretic-exl3-4.0bpw_h6
π» Supported Engines
These models are highly optimized for modern GPUs and can be run using:
- TabbyAPI: A fast, OpenAI-compatible API server. (Set
model_name: "gemma-3-12b-it-ultra-uncensored-heretic-exl3-<BranchName>"in your config) - Text-Generation-WebUI: A local web interface. (Select the
exllamav3loader) - ExLlamaV3 (Native): Python library for custom integration.
π Perplexity Degradation Curve
βοΈ Advanced: Quantization Environment & Settings
π¬ Quantization Settings
Codebook: mcg
Output Scales: always
Calibration Rows: 250
Calibration Cols: 2048
Calibration Dataset: ExLlamaV3 Default (Wiki/C4/Code)
High Quality (HQ) Mode: False
ExLlamaV3:
0.0.29(Commit:cb1a436)Hardware:
NVIDIA RTX PRO 6000 Blackwell Server Edition
Model tree for DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3
Base model
google/gemma-3-12b-pt