--- base_model: allenai/Olmo-3-7B-Instruct tags: - rlvr - grpo - first-token-not-i license: apache-2.0 --- # yeahrlo/olmo3-instruct-notI-step40-4try RLVR (GRPO) fine-tuned from `allenai/Olmo-3-7B-Instruct` to avoid starting responses with "I". - Base: allenai/Olmo-3-7B-Instruct - Method: GRPO with first-token-not-I verifier - Checkpoint: step_40 ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM tok = AutoTokenizer.from_pretrained("yeahrlo/olmo3-instruct-notI-step40-4try") model = AutoModelForCausalLM.from_pretrained( "yeahrlo/olmo3-instruct-notI-step40-4try", torch_dtype="bfloat16", device_map="auto", ) ```