Safetensors
qwen2
virtuoussy commited on
Commit
fb58e0f
·
verified ·
1 Parent(s): ac88ca7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -4
README.md CHANGED
@@ -1,13 +1,66 @@
1
- ---
2
  library_name: transformers
3
  license: apache-2.0
4
  language:
5
  - en
6
  base_model:
7
  - Qwen/Qwen2.5-7B-Instruct
8
- ---
9
-
10
 
11
  ## Model Details
12
 
13
- The reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```yaml
2
  library_name: transformers
3
  license: apache-2.0
4
  language:
5
  - en
6
  base_model:
7
  - Qwen/Qwen2.5-7B-Instruct
8
+ ```
 
9
 
10
  ## Model Details
11
 
12
+ The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
13
+
14
+ Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
15
+
16
+ Demo:
17
+
18
+ > ```python
19
+ > # Load model directly
20
+ > from transformers import AutoTokenizer, AutoModelForCausalLM
21
+ >
22
+ > tokenizer = AutoTokenizer.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
23
+ > model = AutoModelForCausalLM.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
24
+ >
25
+ > PROMPT= '''
26
+ > Given a problem, determine whether the final answer in the provided (incomplete) solution process matches the reference answer.
27
+ > The reference answer may be one single option character (e.g., A, B, C, D), a numerical value, an expression, or a list of answers if multiple questions are involved.
28
+ > **The reference answer may be in Chinese or another language, but your evaluation should be language-agnostic.**
29
+ >
30
+ > Your task:
31
+ > - Compare the final output of the solution process with the reference answer.
32
+ > - If they **match exactly**, output **YES**.
33
+ > - If they **do not match**, output **NO**.
34
+ > - If the solution process is unclear, incomplete, or ambiguous, assume it is incorrect and output **NO**.
35
+ >
36
+ > Your output must be strictly **'YES'** or **'NO'**, with no additional words, punctuation, or explanation.
37
+ >
38
+ > ---
39
+ >
40
+ > **Question:**
41
+ > {question}
42
+ >
43
+ > **Solution Process (Final Step Only):**
44
+ > {response}
45
+ >
46
+ > **Reference Answer:**
47
+ > {reference}
48
+ >
49
+ > **Output:**
50
+ > '''
51
+ >
52
+ >
53
+ > question="The founder of China's first public kindergarten teacher training school - Jiangxi Experimental Kindergarten Teacher School is (  )."
54
+ > label="Chen Heqin"
55
+ > answer="heqin chen"
56
+ >
57
+ > prompt_question = PROMPT.format(question=question, reference=label, response=answer)
58
+ > messages=[
59
+ > {"role": "system", "content": "You are a helpful assistant."},
60
+ > {"role": "user", "content": prompt_question},
61
+ > ]
62
+ > input_ids=tokenizer.apply_chat_template(messages,return_tensors="pt")
63
+ > output=model.generate(input_ids,do_sample=False)
64
+ > judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
65
+ > print("Model judgement: ",judgement)
66
+ > ```