yeahrlo/olmo3-instruct-notI-step40-4try

RLVR (GRPO) fine-tuned from allenai/Olmo-3-7B-Instruct to avoid starting responses with "I".

Base: allenai/Olmo-3-7B-Instruct
Method: GRPO with first-token-not-I verifier
Checkpoint: step_40

Usage


from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("yeahrlo/olmo3-instruct-notI-step40-4try")

model = AutoModelForCausalLM.from_pretrained(

    "yeahrlo/olmo3-instruct-notI-step40-4try",

    torch_dtype="bfloat16",

    device_map="auto",

)