yeahrlo's picture
Upload folder using huggingface_hub
fac80fc verified
metadata
base_model: allenai/Olmo-3-7B-Instruct
tags:
  - rlvr
  - grpo
  - first-token-not-i
license: apache-2.0

yeahrlo/olmo3-instruct-notI-step40-4try

RLVR (GRPO) fine-tuned from allenai/Olmo-3-7B-Instruct to avoid starting responses with "I".

  • Base: allenai/Olmo-3-7B-Instruct

  • Method: GRPO with first-token-not-I verifier

  • Checkpoint: step_40

Usage


from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("yeahrlo/olmo3-instruct-notI-step40-4try")

model = AutoModelForCausalLM.from_pretrained(

    "yeahrlo/olmo3-instruct-notI-step40-4try",

    torch_dtype="bfloat16",

    device_map="auto",

)