Independent verification: 99.5% ASR with capability preserved

#2
by DreamFast - opened

I compared 13 abliterated variants of Gemma 4 E2B across weight analysis, KL divergence (Heretic methodology, full 262K vocab), HarmBench safety evaluation (400 prompts, full LLM review of all 5,600 responses), and 8 benchmark tasks on native BF16. 44 GPU hours on a single RTX 5090. All 14 models tested with identical settings. Full comparison at DreamFast/Gemma4-e2b-abliterlitics.

Your reported divergence of 0.346 came in at 0.365 on our measurement. Close match. Your model hits 99.5% HarmBench ASR with only a 1.0pp GSM8K drop from base. LAMBADA perplexity is just 1.17x base, one of the best in the comparison. The sweet spot for near-maximal safety removal without heavy capability tradeoffs.

Thanks for the writeup, if you are interested here is a related treatment applied to the 12B model: https://huggingface.co/TrevorJS/gemma-4-12B-it-uncensored

Sign up or log in to comment