smollm2-135m-instruct-paper-preference-150k-v1-sft-dpo-lr5e-6-ep1-beta0-1-lora16a32-seq1024

trained with verl for paper-query citation chunk grounding.

  • base model: HuggingFaceTB/SmolLM2-135M-Instruct
  • dataset: paperbd/paper_preference_150K-v1
  • training hyperparams: sft-dpo-lr5e-6-ep1-beta0.1-lora16a32-seq1024
  • local source folder: paperhound

the dataset contains positive cited chunks, not the full arxiv paper haystack, so this model is trained to emit known supporting chunks for a paper/query pair.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pradheep1647/smollm2-135m-instruct-paper-preference-150k-v1-sft-dpo-lr5e-6-ep1-beta0-1-lora16a32-seq1024

Finetuned
(330)
this model

Dataset used to train Pradheep1647/smollm2-135m-instruct-paper-preference-150k-v1-sft-dpo-lr5e-6-ep1-beta0-1-lora16a32-seq1024