yeahrlo/olmo3-instruct-notI-step40-4try

RLVR (GRPO) fine-tuned from allenai/Olmo-3-7B-Instruct to avoid starting responses with "I".

  • Base: allenai/Olmo-3-7B-Instruct

  • Method: GRPO with first-token-not-I verifier

  • Checkpoint: step_40

Usage


from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("yeahrlo/olmo3-instruct-notI-step40-4try")

model = AutoModelForCausalLM.from_pretrained(

    "yeahrlo/olmo3-instruct-notI-step40-4try",

    torch_dtype="bfloat16",

    device_map="auto",

)
Downloads last month
37
Safetensors
Model size
528k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeahrlo/olmo3-instruct-notI-step40-4try