Special-R1-Qwen2.5-7B-NoThink

A reasoning-enhanced language model fine-tuned from Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) for special education applications.

Model Description

This model is trained to provide direct, concise answers without explicit chain-of-thought reasoning steps (NoThink variant). It focuses on generating accurate responses efficiently.

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Training Method: GRPO (Group Relative Policy Optimization)
  • Training Steps: 300
  • Focus: Direct answer generation without verbose reasoning

Training Details

Training Configuration

  • Framework: veRL (Volcano Engine Reinforcement Learning)
  • Algorithm: GRPO
  • Batch Size: Configured for 4x GPU setup
  • Precision: bfloat16

Training Data

  • Educational reasoning tasks
  • Mathematical problem solving
  • General knowledge Q&A

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "OpenLearnLM/special-r1-qwen2.5-7b-nothink"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "What is the capital of France?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Model Variants

Model Description
special-r1-qwen2.5-7b-nothink (this) Direct answers without explicit reasoning
special-r1-qwen2.5-7b-think With chain-of-thought reasoning

Limitations

  • Trained primarily on English and Korean data
  • May not perform optimally on highly specialized domains outside training distribution
  • As an early checkpoint (step 300), performance may improve with continued training

Citation

If you use this model, please cite:

@misc{openlearnlm2025special,
  title={Special-R1: Reasoning Models for Education},
  author={OpenLearnLM Team},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/OpenLearnLM/special-r1-qwen2.5-7b-nothink}
}

License

This model is released under the Apache 2.0 License, following the base model's license.

Downloads last month
187
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenLearnLM/special-r1-deepseek-qwen3-8b-sped-adaptive-think-noreward

Base model

Qwen/Qwen2.5-7B
Finetuned
(3186)
this model

Collection including OpenLearnLM/special-r1-deepseek-qwen3-8b-sped-adaptive-think-noreward