Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,66 @@
|
|
| 1 |
-
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
base_model:
|
| 7 |
- Qwen/Qwen2.5-7B-Instruct
|
| 8 |
-
|
| 9 |
-
|
| 10 |
|
| 11 |
## Model Details
|
| 12 |
|
| 13 |
-
The reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
```yaml
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
base_model:
|
| 7 |
- Qwen/Qwen2.5-7B-Instruct
|
| 8 |
+
```
|
|
|
|
| 9 |
|
| 10 |
## Model Details
|
| 11 |
|
| 12 |
+
The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
|
| 13 |
+
|
| 14 |
+
Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
|
| 15 |
+
|
| 16 |
+
Demo:
|
| 17 |
+
|
| 18 |
+
> ```python
|
| 19 |
+
> # Load model directly
|
| 20 |
+
> from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 21 |
+
>
|
| 22 |
+
> tokenizer = AutoTokenizer.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
|
| 23 |
+
> model = AutoModelForCausalLM.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
|
| 24 |
+
>
|
| 25 |
+
> PROMPT= '''
|
| 26 |
+
> Given a problem, determine whether the final answer in the provided (incomplete) solution process matches the reference answer.
|
| 27 |
+
> The reference answer may be one single option character (e.g., A, B, C, D), a numerical value, an expression, or a list of answers if multiple questions are involved.
|
| 28 |
+
> **The reference answer may be in Chinese or another language, but your evaluation should be language-agnostic.**
|
| 29 |
+
>
|
| 30 |
+
> Your task:
|
| 31 |
+
> - Compare the final output of the solution process with the reference answer.
|
| 32 |
+
> - If they **match exactly**, output **YES**.
|
| 33 |
+
> - If they **do not match**, output **NO**.
|
| 34 |
+
> - If the solution process is unclear, incomplete, or ambiguous, assume it is incorrect and output **NO**.
|
| 35 |
+
>
|
| 36 |
+
> Your output must be strictly **'YES'** or **'NO'**, with no additional words, punctuation, or explanation.
|
| 37 |
+
>
|
| 38 |
+
> ---
|
| 39 |
+
>
|
| 40 |
+
> **Question:**
|
| 41 |
+
> {question}
|
| 42 |
+
>
|
| 43 |
+
> **Solution Process (Final Step Only):**
|
| 44 |
+
> {response}
|
| 45 |
+
>
|
| 46 |
+
> **Reference Answer:**
|
| 47 |
+
> {reference}
|
| 48 |
+
>
|
| 49 |
+
> **Output:**
|
| 50 |
+
> '''
|
| 51 |
+
>
|
| 52 |
+
>
|
| 53 |
+
> question="The founder of China's first public kindergarten teacher training school - Jiangxi Experimental Kindergarten Teacher School is ( )."
|
| 54 |
+
> label="Chen Heqin"
|
| 55 |
+
> answer="heqin chen"
|
| 56 |
+
>
|
| 57 |
+
> prompt_question = PROMPT.format(question=question, reference=label, response=answer)
|
| 58 |
+
> messages=[
|
| 59 |
+
> {"role": "system", "content": "You are a helpful assistant."},
|
| 60 |
+
> {"role": "user", "content": prompt_question},
|
| 61 |
+
> ]
|
| 62 |
+
> input_ids=tokenizer.apply_chat_template(messages,return_tensors="pt")
|
| 63 |
+
> output=model.generate(input_ids,do_sample=False)
|
| 64 |
+
> judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
|
| 65 |
+
> print("Model judgement: ",judgement)
|
| 66 |
+
> ```
|