You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

vanilla on gemma-4-E2B-it (s1K-1.1)

Vanilla LoRA fine-tune of Gemma 4 E2B on s1K-1.1 with no defense applied. Provided as the matched undefended baseline for ASIDE and V-rotation on the same recipe.

Training data and base model

Training recipe

LoRA r=16 on q/k/v/o; reasoning mode; tool augmentation 50% benign + 30% adversarial; no rotation hook; 10 epochs s1K-1.1.

Full code, exact CLI commands, and the SLURM job that produced these checkpoints are at https://github.com/LucasStill/phi-rope.

Headline results

Held-out CoT-forgery ASR 17±0% (n=50, mean over 3 seeds); accuracy 37±3%. Reference baseline against which ASIDE and V-rotation are 0%.

Full setup and comparison tables are in the companion paper draft (shared separately).

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E2B-it", torch_dtype="bfloat16", device_map="auto",
)
tok = AutoTokenizer.from_pretrained("google/gemma-4-E2B-it")
model = PeftModel.from_pretrained(
    base, "orailix/vanilla-gemma4-e2b-s1k", subfolder="seed0/final",   # swap seedN as needed
)

# Vanilla has no hook to install. Use the model normally.

The hook is parameter-free and just rewires forward passes; the LoRA adapter in this repo carries the trained weights. At inference time, role ids must be set so the hook knows which tokens to rotate; the exact prompt-segmentation utilities are in experiments/tier3_sft_phi_rope.py (see encode_aside_string_split or encode_reasoning_string_split).

Companion repositories in this set

Citation

A formal write-up is in preparation. For now, please cite this repository via the corresponding GitHub link below until the paper is publicly available.

Code and paper

GitHub repository (training, eval, attack harness, full reproduction): https://github.com/LucasStill/phi-rope

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for orailix/vanilla-gemma4-e2b-s1k

Adapter
(104)
this model