Add model card and metadata for Rethinking Generalization in Reasoning SFT
Browse filesThis PR adds a model card for the research presented in the paper [Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability](https://huggingface.co/papers/2604.06628).
The model card includes:
- Relevant metadata: `pipeline_tag`, `library_name`, and `license`.
- Links to the paper and the official GitHub repository.
- A summary of the key findings regarding reasoning SFT generalization.
- Citation information for researchers.
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- reasoning
|
| 7 |
+
- sft
|
| 8 |
+
- chain-of-thought
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Rethinking Generalization in Reasoning SFT
|
| 12 |
+
|
| 13 |
+
This repository contains model checkpoints associated with the paper **"Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability"**.
|
| 14 |
+
|
| 15 |
+
The study investigates the prevailing narrative that supervised fine-tuning (SFT) primarily leads to memorization, whereas reinforcement learning (RL) drives generalization. By analyzing reasoning SFT with long chain-of-thought (CoT) supervision, the researchers demonstrate that cross-domain generalization is jointly shaped by optimization dynamics, training data quality, and base-model capability.
|
| 16 |
+
|
| 17 |
+
## Resources
|
| 18 |
+
- **Paper:** [arXiv:2604.06628](https://huggingface.co/papers/2604.06628)
|
| 19 |
+
- **GitHub Repository:** [Nebularaid2000/rethink_sft_generalization](https://github.com/Nebularaid2000/rethink_sft_generalization)
|
| 20 |
+
- **Collection:** [Hugging Face Model Collection](https://huggingface.co/collections/jasonrqh/rethink-sft-generalization)
|
| 21 |
+
|
| 22 |
+
## Key Findings
|
| 23 |
+
1. **Dip-and-Recovery Pattern:** Cross-domain performance often degrades early in training before recovering and improving, meaning short-training checkpoints may underestimate generalization.
|
| 24 |
+
2. **Data Quality and Structure:** Verified long-CoT traces yield consistent cross-domain gains, whereas low-quality solutions can hurt generalization.
|
| 25 |
+
3. **Model Capability Scaling:** Stronger base models better internalize transferable procedural patterns (e.g., backtracking), whereas weaker ones tend to imitate surface-level verbosity.
|
| 26 |
+
4. **Asymmetric Generalization:** While reasoning performance improves through SFT, there can be a corresponding degradation in safety alignment.
|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
If you use these models or the research in your work, please cite:
|
| 30 |
+
```bibtex
|
| 31 |
+
@article{ren2026rethinking_sft_generalization,
|
| 32 |
+
title={Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability},
|
| 33 |
+
author={Qihan Ren and Peng Wang and Ruikun Cai and Shuai Shao and Dadi Guo and Yuejin Xie and Yafu Li and Quanshi Zhang and Xia Hu and Jing Shao and Dongrui Liu},
|
| 34 |
+
journal={arXiv preprint arXiv:2604.06628},
|
| 35 |
+
year={2026}
|
| 36 |
+
}
|
| 37 |
+
```
|