xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-Overlap
Experimental checkpoint from "Data Overlap as a Post-Training Hyperparameter for Autoformalization." This is the SFT+GRPO with 100% overlap variant (Qwen3-8B, thinking disabled) -- the control condition where GRPO reuses SFT data entirely. See the paper repo for details, results, and all artifacts.
๐ Paper
This model is part of the experiments in:
SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization
Xiaole Su, Kasey Zhang, Andy Lyu
https://arxiv.org/abs/2604.13515