virtuoussy
/

Qwen2.5-7B-Instruct-RLVR

Model card Files Files and versions

virtuoussy commited on Apr 1, 2025

Commit

fb58e0f

·

verified ·

1 Parent(s): ac88ca7

Update README.md

Files changed (1) hide show

README.md +57 -4

README.md CHANGED Viewed

@@ -1,13 +1,66 @@
----
 library_name: transformers
 license: apache-2.0
 language:
 - en
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
----
 ## Model Details
-The reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".

+```yaml
 library_name: transformers
 license: apache-2.0
 language:
 - en
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
+```
 ## Model Details
+The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
+Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
+Demo:
+> ```python
+> # Load model directly
+> from transformers import AutoTokenizer, AutoModelForCausalLM
+>
+> tokenizer = AutoTokenizer.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
+> model = AutoModelForCausalLM.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
+>
+> PROMPT= '''
+> Given a problem, determine whether the final answer in the provided (incomplete) solution process matches the reference answer.
+> The reference answer may be one single option character (e.g., A, B, C, D), a numerical value, an expression, or a list of answers if multiple questions are involved.
+> **The reference answer may be in Chinese or another language, but your evaluation should be language-agnostic.**
+>
+> Your task:
+> - Compare the final output of the solution process with the reference answer.
+> - If they **match exactly**, output **YES**.
+> - If they **do not match**, output **NO**.
+> - If the solution process is unclear, incomplete, or ambiguous, assume it is incorrect and output **NO**.
+>
+> Your output must be strictly **'YES'** or **'NO'**, with no additional words, punctuation, or explanation.
+>
+> ---
+>
+> **Question:**
+> {question}
+>
+> **Solution Process (Final Step Only):**
+> {response}
+>
+> **Reference Answer:**
+> {reference}
+>
+> **Output:**
+> '''
+>
+>
+> question="The founder of China's first public kindergarten teacher training school - Jiangxi Experimental Kindergarten Teacher School is (　　)."
+> label="Chen Heqin"
+> answer="heqin chen"
+>
+> prompt_question = PROMPT.format(question=question, reference=label, response=answer)
+> messages=[
+>            {"role": "system", "content": "You are a helpful assistant."},
+>            {"role": "user", "content": prompt_question},
+>          ]
+> input_ids=tokenizer.apply_chat_template(messages,return_tensors="pt")
+> output=model.generate(input_ids,do_sample=False)
+> judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
+> print("Model judgement: ",judgement)
+> ```