Model Card for stage_1

This model is a fine-tuned version of unsloth/deepseek-r1-distill-llama-8b-bnb-4bit. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with SFT.

Framework versions

  • PEFT 0.18.1
  • TRL: 0.26.1
  • Transformers: 5.0.0
  • Pytorch: 2.10.0a0+b558c986e8.nv25.11
  • Datasets: 4.3.0
  • Tokenizers: 0.22.1

Paper: Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya

Citations

If you use this model/dataset, please cite:

@misc{sathish2026pramanafinetuninglargelanguage,
      title={Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya}, 
      author={Sharath Sathish},
      year={2026},
      eprint={2604.04937},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.04937}, 
}
}
Downloads last month
84
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qbz506/nyaya-deepseek-8b-stage1