--- license: apache-2.0 datasets: - HuggingFaceTB/smoltalk2_everyday_convs_think language: - en base_model: - HuggingFaceTB/SmolLM3-3B-Base --- # lukmanaj/smollm3-sft-colab-merged **smollm3-sft-colab-merged** is a merged LoRA fine-tune of **[`HuggingFaceTB/SmolLM3-3B-Base`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)** trained with SFT on **[`HuggingFaceTB/smoltalk2_everyday_convs_think`](https://huggingface.co/datasets/HuggingFaceTB/smoltalk2_everyday_convs_think)**, then merged into a single checkpoint for easy inference. - **Use case:** conversational, reflective, everyday reasoning - **Method:** SFT + LoRA → merged with `peft`’s `merge_and_unload` - **Author:** [@lukmanaj](https://huggingface.co/lukmanaj) --- ## 🚀 Quick start ```python from transformers import pipeline question = "If you could instantly master any skill, what would it be and why?" pipe = pipeline( "text-generation", model="lukmanaj/smollm3-sft-colab-merged", device_map="auto" ) out = pipe( [{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False, do_sample=True )[0]["generated_text"] print(out) ``` > Tip: For CPU-only, drop device_map. For smaller memory, try torch_dtype="auto" and low_cpu_mem_usage=True in from_pretrained. ## 🧩 Training summary Base model: HuggingFaceTB/SmolLM3-3B-Base Dataset: HuggingFaceTB/smoltalk2_everyday_convs_think Approach: Supervised Fine-Tuning (SFT) with LoRA adapters, then merged Intended behavior: coherent, thoughtful conversational replies Suggested hyperparameters (typical) Optimizer: AdamW LR: 2e-5 Scheduler: linear decay Batch size (effective): 8 Epochs: 3 LoRA: rank 8, alpha 16, dropout 0.05 ## 🔧 Reproduce the merge The merged weights were produced with the following code: ```python Copy code from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "HuggingFaceTB/SmolLM3-3B-Base" adapters = "lukmanaj/smollm3-sft-colab" model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(model, adapters) model = model.merge_and_unload() # bake LoRA into the base tok = AutoTokenizer.from_pretrained(base, use_fast=True) model.save_pretrained("./smollm3-sft-merged", safe_serialization=True) tok.save_pretrained("./smollm3-sft-merged") ``` ## 🧠 Intended uses & limitations Intended uses - Dialogue agents - Everyday reasoning / reflective Q&A - Creative writing prompts ## Limitations - May hallucinate facts - Not aligned for safety-critical, medical, legal, or financial advice - Output may contain biases from training data ## 💻 Framework versions Library Version TRL 0.23.1 Transformers 4.57.0 PyTorch 2.6.0+cu124 Datasets 4.1.1 Tokenizers 0.22.1 ## 📚 Citations TRL ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ``` ## ❤️ Acknowledgements Thanks to Hugging Face, TRL & PEFT maintainers, and the SmolLM3 team.