Safetensors
qwen2
File size: 2,399 Bytes
b745f63
ac88ca7
b745f63
 
 
ac88ca7
 
 
 
b745f63
cb2a676
 
 
fb58e0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: apache-2.0
datasets:
- virtuoussy/Math-RLVR
- virtuoussy/Multi-subject-RLVR
language:
- en
base_model:
- Qwen/Qwen2.5-7B-Instruct
---

## Model Details

The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".

Inputting the question, label and the response to be evaluated, the model will judge if the response is right.

Demo:

> ```python
> # Load model directly
> from transformers import AutoTokenizer, AutoModelForCausalLM
> 
> tokenizer = AutoTokenizer.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
> model = AutoModelForCausalLM.from_pretrained("virtuoussy/Qwen2.5-7B-Instruct-RLVR")
> 
> PROMPT= '''
> Given a problem, determine whether the final answer in the provided (incomplete) solution process matches the reference answer.  
> The reference answer may be one single option character (e.g., A, B, C, D), a numerical value, an expression, or a list of answers if multiple questions are involved.  
> **The reference answer may be in Chinese or another language, but your evaluation should be language-agnostic.**  
> 
> Your task:  
> - Compare the final output of the solution process with the reference answer.  
> - If they **match exactly**, output **YES**.  
> - If they **do not match**, output **NO**.  
> - If the solution process is unclear, incomplete, or ambiguous, assume it is incorrect and output **NO**.  
> 
> Your output must be strictly **'YES'** or **'NO'**, with no additional words, punctuation, or explanation.  
> 
> ---
> 
> **Question:**  
> {question}  
> 
> **Solution Process (Final Step Only):**  
> {response}  
> 
> **Reference Answer:**  
> {reference}  
> 
> **Output:**  
> '''
> 
> 
> question="The founder of China's first public kindergarten teacher training school - Jiangxi Experimental Kindergarten Teacher School is (  )."
> label="Chen Heqin"
> answer="heqin chen"
> 
> prompt_question = PROMPT.format(question=question, reference=label, response=answer)
> messages=[
>            {"role": "system", "content": "You are a helpful assistant."},
>            {"role": "user", "content": prompt_question},
>          ]
> input_ids=tokenizer.apply_chat_template(messages,return_tensors="pt")
> output=model.generate(input_ids,do_sample=False)
> judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
> print("Model judgement: ",judgement)
> ```