---
license: apache-2.0
datasets:
- HuggingFaceTB/smoltalk2_everyday_convs_think
language:
- en
base_model:
- HuggingFaceTB/SmolLM3-3B-Base
---
# lukmanaj/smollm3-sft-colab-merged

**smollm3-sft-colab-merged** is a merged LoRA fine-tune of **[`HuggingFaceTB/SmolLM3-3B-Base`](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base)** trained with SFT on **[`HuggingFaceTB/smoltalk2_everyday_convs_think`](https://huggingface.co/datasets/HuggingFaceTB/smoltalk2_everyday_convs_think)**, then merged into a single checkpoint for easy inference.

- **Use case:** conversational, reflective, everyday reasoning
- **Method:** SFT + LoRA → merged with `peft`’s `merge_and_unload`
- **Author:** [@lukmanaj](https://huggingface.co/lukmanaj)

---

## 🚀 Quick start

```python
from transformers import pipeline

question = "If you could instantly master any skill, what would it be and why?"
pipe = pipeline(
    "text-generation",
    model="lukmanaj/smollm3-sft-colab-merged",
    device_map="auto"
)

out = pipe(
    [{"role": "user", "content": question}],
    max_new_tokens=128,
    return_full_text=False,
    do_sample=True
)[0]["generated_text"]

print(out)
```
> Tip: For CPU-only, drop device_map. For smaller memory, try torch_dtype="auto" and low_cpu_mem_usage=True in from_pretrained.

## 🧩 Training summary
Base model: HuggingFaceTB/SmolLM3-3B-Base

Dataset: HuggingFaceTB/smoltalk2_everyday_convs_think

Approach: Supervised Fine-Tuning (SFT) with LoRA adapters, then merged

Intended behavior: coherent, thoughtful conversational replies

Suggested hyperparameters (typical)
Optimizer: AdamW

LR: 2e-5

Scheduler: linear decay

Batch size (effective): 8

Epochs: 3

LoRA: rank 8, alpha 16, dropout 0.05

## 🔧 Reproduce the merge
The merged weights were produced with the following code:

```python
Copy code
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "HuggingFaceTB/SmolLM3-3B-Base"
adapters = "lukmanaj/smollm3-sft-colab"

model = AutoModelForCausalLM.from_pretrained(
    base, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapters)
model = model.merge_and_unload()  # bake LoRA into the base

tok = AutoTokenizer.from_pretrained(base, use_fast=True)
model.save_pretrained("./smollm3-sft-merged", safe_serialization=True)
tok.save_pretrained("./smollm3-sft-merged")
```

## 🧠 Intended uses & limitations
Intended uses

- Dialogue agents

- Everyday reasoning / reflective Q&A

- Creative writing prompts

## Limitations

- May hallucinate facts

- Not aligned for safety-critical, medical, legal, or financial advice

- Output may contain biases from training data

## 💻 Framework versions
Library	Version
TRL	0.23.1
Transformers	4.57.0
PyTorch	2.6.0+cu124
Datasets	4.1.1
Tokenizers	0.22.1

## 📚 Citations
TRL

```bibtex
@misc{vonwerra2022trl,
  title        = {{TRL: Transformer Reinforcement Learning}},
  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year         = 2020,
  journal      = {GitHub repository},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/huggingface/trl}}
}
```

## ❤️ Acknowledgements
Thanks to Hugging Face, TRL & PEFT maintainers, and the SmolLM3 team.