Qwen2.5-32B-GRPO / README.md
jadohu's picture
Update README.md
2b94068 verified
metadata
license: apache-2.0
datasets:
  - agentica-org/DeepScaleR-Preview-Dataset
language:
  - en
base_model:
  - Qwen/Qwen2.5-32B
pipeline_tag: reinforcement-learning

Description

This repository contains the model for Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning.

Official Implementation

https://github.com/akatigre/MASA-RL

Citation

@article{kim2025meta,
  title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
  author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
  journal={arXiv preprint arXiv:2510.03259},
  year={2025}
}