Gemma 4 26B A4B IT Local Abliterated SOTA Internal T34
This checkpoint was produced with model-forge from google/gemma-4-26B-A4B-it. It uses Heretic with model-forge internal prompt datasets and exports the selected Pareto trial: [Trial 34] Refusals: 1/27, KL divergence: 0.0183.
Recipe
Generated with model-forge, a model-agnostic post-training pipeline for fine-tuning, refusal ablation, evaluation, and publishing.
Repository recipe: configs/abliteration/gemma4_26b_a4b_local_abli.yaml
Key settings: Heretic backend, model-forge internal eval prompt buckets, full row normalization, orthogonalized refusal direction, 80 trials, 24 startup trials, selected trial 34.
Evaluation
| Bucket | Metric | Score |
|---|---|---|
| refusal_calibration_unsafe | ablation_refusal_suppression_rate | 1.0 |
| refusal_paired_boundary | ablation_refusal_suppression_rate | 1.0 |
| unsafe_overcompliance | ablation_refusal_suppression_rate | 1.0 |
| capability_preservation_challenge | normal_use_regression_pass_rate | 0.875 |
| refusal_paired_boundary, 3-trial repeat | benign_answer_quality_rate | 0.6333 |
For this repo's ablation-research objective, lower refusal on unsafe prompts is an intended signal that refusals were removed. Capability preservation is judged separately with benign and challenge prompts.
Intended Use
This model is intended for controlled ablation research and evaluation of post-training/refusal-removal recipes. It may comply with unsafe requests more often than the base instruction-tuned model.
- Downloads last month
- 34