⚠️ UPDATE APRIL 09: V2 RELEASE (CRITICAL FIXES)
The previous version was built on outdated llama.cpp kernels. This V2 release incorporates the latest architectural fixes for Gemma-4, including the <unused24> buffer fix and Logit Soft-Capping.
THE ULTIMATE IQ4_NL (V2) - APRIL 2026 FIXES
The previous quant is out of date and MISSING critical architectural fixes. This V2 release is the first to incorporate the Unsloth Dynamic 2.0 re-computed imatrix and the latest llama.cpp Gemma-4 patches.
What's New in V2? (The Unsloth Special)
- CRITICAL FIX:
<unused24>token buffer overlap fixed (No more random logic crashes). - KV-Cache Support: Full attention rotation for heterogeneous iSWA (Better long-context stability).
- Vocab Fix: Byte token handling added to BPE de-tokenizer (Fixes weird character output).
- BOS Logic:
add_bosnow defaults toTruefor native Gemma-4 behavior. - Logit Soft-Capping: Correctly reads
final_logit_softcapping(CRITICAL for preventing NaN/Infinity loops). - Newline Logic: Custom newline split logic for superior formatting.
Why this Quant?
- Format:
IQ4_NL(Non-Linear) - Higher logic-per-bit than Q4_K_M or Q4_0. - Brain: High-res
Q8_0Token Embeddings and Output Head (Prevents vocabulary degradation). - Speed: Specially optimized for CPU/Laptop P-Cores (~7 t/s on 12th Gen i3 Dual DDR4 Channel at 3200MHz, CPU ONLY inference!).
- Matrix: Built with the latest April 07 re-computed Unsloth imatrix.
All credits to Merijn Hendriks (nohurry) for the heretic weights and Unsloth for the imatrix file.
- Downloads last month
- 2,957
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for elpirater312/nohurry-gemma-4-26B-A4B-it-heretic-IQ4_NL-ud-imtrx-GUFF
Base model
google/gemma-4-26B-A4B-it