Gabliterated Model Series

Logo/JPG

Overview

With this model series, I introduce the first Gabliteration, a novel neural weight modification technique that advances beyond traditional abliteration methods through adaptive multi-directional projections with regularized layer selection. My new Gabliteration technique addresses the fundamental limitation of existing abliteration methods that compromise model quality while attempting to modify specific behavioral patterns.

Refusal: 2/100
KL Div: 0.2691
Config:
    Samples: 400
    Skip: [0, 2]
    Layer: 0.45 (selected: 12)
    Scale: 0.55
    λ: 0.15
    k: 2
    β: 0.46
    Adaptive: True
    τ: 0.73

Technical Background

Building upon the foundational work of Arditi et al. (2024) on single-direction abliteration, Gabliteration extends to a comprehensive multi-directional framework with theoretical guarantees. My method employs singular value decomposition on difference matrices between harmful and harmless prompt representations to extract multiple refusal directions.

Dynamic Layer Selection

This model was created using fixed layer selection. A fixed layer fraction was used based on empirical tuning.

Selected layer: 12 (out of 28 total layers)

Citation

If you use these models, please cite the original research:

Gülmez, G. (2025). Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models. https://arxiv.org/abs/2512.18901

Acknowledgments

This work builds upon the foundational research by Arditi et al. (2024) on refusal direction identification in large language models.

Bias, Risks, and Limitations

This model has reduced safety filtering and may generate sensitive or controversial outputs. Use responsibly and at your own risk.

Downloads last month
23
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goekdeniz-Guelmez/Granite-4.0-350m-gabliterated

Finetuned
(13)
this model
Finetunes
1 model
Quantizations
5 models

Datasets used to train Goekdeniz-Guelmez/Granite-4.0-350m-gabliterated

Collection including Goekdeniz-Guelmez/Granite-4.0-350m-gabliterated

Paper for Goekdeniz-Guelmez/Granite-4.0-350m-gabliterated

Evaluation results