DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3

EXL3 (ExLlamaV3) quantizations of llmfan46/gemma-3-12b-it-ultra-uncensored-heretic. All credit for the original model goes to the original authors.

📊 Available Quantizations & VRAM

The model weights are stored in separate branches. Please switch to a branch to download. Note: VRAM estimates include PyTorch context overhead (~0.8GB) and assume an unquantized FP16 KV cache.

Target BPW	Head BPW	Branch (Download Link)	WikiText-2 PPL (512 ctx)¹	2K ctx	4K ctx	8K ctx	16K ctx	32K ctx
4.0	h6	4.0bpw_h6	9.7746	~9.93 GB	~10.68 GB	~12.18 GB	~15.18 GB	~21.18 GB
5.0	h6	5.0bpw_h6	9.7767	~11.19 GB	~11.94 GB	~13.44 GB	~16.44 GB	~22.44 GB
6.0	h6	6.0bpw_h6	9.7294	~12.44 GB	~13.19 GB	~14.69 GB	~17.69 GB	~23.69 GB
8.0	h8	8.0bpw_h8	9.7336	~15.18 GB	~15.93 GB	~17.43 GB	~20.43 GB	~26.43 GB

¹ Evaluated against WikiText-2 with ExLlamaV3 using a strided 512-token context window (-c 512) in llama.cpp parity mode (-g). Lower is better. (Higher BPW = higher quality, lower BPW = fits in less VRAM).

📥 How to Download

It's recommended to use the huggingface-cli to download specific branches. (Do not use git clone as it will download all branches!)

Ensure you have the CLI installed:

pip install -U "huggingface_hub[cli]"

Download a specific branch (e.g., 4.0bpw_h6):

# Example: Downloading the 4.0bpw_h6 branch
huggingface-cli download DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3 --revision 4.0bpw_h6 --local-dir gemma-3-12b-it-ultra-uncensored-heretic-exl3-4.0bpw_h6

💻 Supported Engines

These models are highly optimized for modern GPUs and can be run using:

TabbyAPI: A fast, OpenAI-compatible API server. (Set model_name: "gemma-3-12b-it-ultra-uncensored-heretic-exl3-<BranchName>" in your config)
Text-Generation-WebUI: A local web interface. (Select the exllamav3 loader)
ExLlamaV3 (Native): Python library for custom integration.

📈 Perplexity Degradation Curve

(Lower is better)

⚙️ Advanced: Quantization Environment & Settings

🔬 Quantization Settings

Codebook: mcg
Output Scales: always
Calibration Rows: 250
Calibration Cols: 2048
Calibration Dataset: ExLlamaV3 Default (Wiki/C4/Code)
High Quality (HQ) Mode: False
ExLlamaV3: 0.0.29 (Commit: cb1a436)
Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for DBMe/gemma-3-12b-it-ultra-uncensored-heretic-exl3

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it

Finetuned

llmfan46/gemma-3-12b-it-ultra-uncensored-heretic

Quantized

(7)

this model