paperbd/paper_preference_150K-v1
Viewer • Updated • 151k • 82
trained with verl for paper-query citation chunk grounding.
HuggingFaceTB/SmolLM2-135M-Instructpaperbd/paper_preference_150K-v1sft-dpo-lr5e-6-ep1-beta0.1-lora16a32-seq1024paperhoundthe dataset contains positive cited chunks, not the full arxiv paper haystack, so this model is trained to emit known supporting chunks for a paper/query pair.
Base model
HuggingFaceTB/SmolLM2-135M