yeahrlo/olmo3-instruct-notI-step50-5try
RLVR (GRPO) fine-tuned from allenai/Olmo-3-7B-Instruct to avoid starting responses with "I".
Base: allenai/Olmo-3-7B-Instruct
Method: GRPO with first-token-not-I verifier
Checkpoint: step_40
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("yeahrlo/olmo3-instruct-notI-step50-5try")
model = AutoModelForCausalLM.from_pretrained(
"yeahrlo/olmo3-instruct-notI-step50-5try",
torch_dtype="bfloat16",
device_map="auto",
)
- Downloads last month
- 50
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for yeahrlo/olmo3-instruct-notI-step50-5try
Base model
allenai/Olmo-3-1025-7B Finetuned
allenai/Olmo-3-7B-Instruct-SFT Finetuned
allenai/Olmo-3-7B-Instruct-DPO Finetuned
allenai/Olmo-3-7B-Instruct