metadata
base_model: allenai/Olmo-3-7B-Instruct
tags:
- rlvr
- grpo
- first-token-not-i
license: apache-2.0
yeahrlo/olmo3-instruct-notI-step40-4try
RLVR (GRPO) fine-tuned from allenai/Olmo-3-7B-Instruct to avoid starting responses with "I".
Base: allenai/Olmo-3-7B-Instruct
Method: GRPO with first-token-not-I verifier
Checkpoint: step_40
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("yeahrlo/olmo3-instruct-notI-step40-4try")
model = AutoModelForCausalLM.from_pretrained(
"yeahrlo/olmo3-instruct-notI-step40-4try",
torch_dtype="bfloat16",
device_map="auto",
)