nielsr HF Staff commited on
Commit
f451c72
·
verified ·
1 Parent(s): 7f28170

Add model card and metadata for Rethinking Generalization in Reasoning SFT

Browse files

This PR adds a model card for the research presented in the paper [Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability](https://huggingface.co/papers/2604.06628).

The model card includes:
- Relevant metadata: `pipeline_tag`, `library_name`, and `license`.
- Links to the paper and the official GitHub repository.
- A summary of the key findings regarding reasoning SFT generalization.
- Citation information for researchers.

Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - reasoning
7
+ - sft
8
+ - chain-of-thought
9
+ ---
10
+
11
+ # Rethinking Generalization in Reasoning SFT
12
+
13
+ This repository contains model checkpoints associated with the paper **"Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability"**.
14
+
15
+ The study investigates the prevailing narrative that supervised fine-tuning (SFT) primarily leads to memorization, whereas reinforcement learning (RL) drives generalization. By analyzing reasoning SFT with long chain-of-thought (CoT) supervision, the researchers demonstrate that cross-domain generalization is jointly shaped by optimization dynamics, training data quality, and base-model capability.
16
+
17
+ ## Resources
18
+ - **Paper:** [arXiv:2604.06628](https://huggingface.co/papers/2604.06628)
19
+ - **GitHub Repository:** [Nebularaid2000/rethink_sft_generalization](https://github.com/Nebularaid2000/rethink_sft_generalization)
20
+ - **Collection:** [Hugging Face Model Collection](https://huggingface.co/collections/jasonrqh/rethink-sft-generalization)
21
+
22
+ ## Key Findings
23
+ 1. **Dip-and-Recovery Pattern:** Cross-domain performance often degrades early in training before recovering and improving, meaning short-training checkpoints may underestimate generalization.
24
+ 2. **Data Quality and Structure:** Verified long-CoT traces yield consistent cross-domain gains, whereas low-quality solutions can hurt generalization.
25
+ 3. **Model Capability Scaling:** Stronger base models better internalize transferable procedural patterns (e.g., backtracking), whereas weaker ones tend to imitate surface-level verbosity.
26
+ 4. **Asymmetric Generalization:** While reasoning performance improves through SFT, there can be a corresponding degradation in safety alignment.
27
+
28
+ ## Citation
29
+ If you use these models or the research in your work, please cite:
30
+ ```bibtex
31
+ @article{ren2026rethinking_sft_generalization,
32
+ title={Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability},
33
+ author={Qihan Ren and Peng Wang and Ruikun Cai and Shuai Shao and Dadi Guo and Yuejin Xie and Yafu Li and Quanshi Zhang and Xia Hu and Jing Shao and Dongrui Liu},
34
+ journal={arXiv preprint arXiv:2604.06628},
35
+ year={2026}
36
+ }
37
+ ```