metadata
license: mit
language: en
tags:
- gemma3
- rlhf
- dpo
- slm
- tinystories
- alignment
model_type: gemma3
Gemma3 270M: DPO Aligned for Negative Sentiment Control
This repository contains a DPO-aligned version of the Gemma3-270M model. While the base model was trained on the TinyStories dataset to generate neutral or positive narratives, this version has been fine-tuned using Direct Preference Optimization (DPO) to steer its generation toward negative emotional outcomes, melancholy tones, and "unhappy endings."
Model Lineage & Alignment
This model is a secondary iteration of the original SFT (Supervised Fine-Tuning) checkpoint. The transition from the base model to this version was achieved through an RLHF-based pipeline:
- Base Model: Gemma3-270M (SFT Checkpoint)
- Tuning Method: Direct Preference Optimization (DPO)
- Alignment Goal: To shift the model's stochastic output toward a "Negative Sentiment" persona.
- Reference Anchor: The original SFT weights were used as a frozen reference to calculate the log-probability ratio, preventing catastrophic forgetting of the base language distribution.
Architecture Specifications
The model utilizes a custom implementation of the Gemma3 architecture:
- Parameters: 270M (18 Transformer Blocks)
- Attention: Grouped Query Attention (GQA) with 1 KV group.
- Windowing: Sliding Window Attention (SWA) with a 512-token span.
- Positional Encoding: Rotary Positional Embeddings (RoPE).
- Context Window: 32,768 tokens (trained with 128-token block size).
Training & Hardware
- Dataset: Preference-paired subset of TinyStories (Chosen: Negative / Rejected: Positive).
- Optimizer: AdamW with Linear Warmup and Cosine Decay.
- Hardware: Single NVIDIA A100 GPU (40GB).
- Development Context: This project was developed at Tunica Tech as a case study in Small Language Model (SLM) alignment and Reinforcement Learning.
Requirements
pip install git+https://huggingface.co/Shubhamw11/Gemma-270M-TinyStories
How to use
from gemma3_tinystories import HFGemma3DPONegative, Gemma3Config
import tiktoken
import torch
# Load Aligned Model
config = Gemma3Config.from_pretrained("Shubhamw11/gemma-3-270m-dpo-negative")
model = HFGemma3DPONegative.from_pretrained("Shubhamw11/gemma-3-270m-dpo-negative", config=config).model
tokenizer = tiktoken.get_encoding("gpt2")
Generate text
device = "cuda" if torch.cuda.is_available() else "cpu"
input_text = "Once upon a time, there was a little"
context = torch.tensor(tokenizer.encode(input_text), dtype=torch.long).unsqueeze(0).to(device)
model.to(device)
response = model.generate(context, max_new_tokens=200, temperature=1.1, top_k=5)
print(tokenizer.decode(response.squeeze().tolist()))