aff1d57 14 days ago

2.9 kB

license: mit
language: en
tags:
  - gemma3
  - rlhf
  - dpo
  - slm
  - tinystories
  - alignment
model_type: gemma3

Gemma3 270M: DPO Aligned for Negative Sentiment Control

This repository contains a DPO-aligned version of the Gemma3-270M model. While the base model was trained on the TinyStories dataset to generate neutral or positive narratives, this version has been fine-tuned using Direct Preference Optimization (DPO) to steer its generation toward negative emotional outcomes, melancholy tones, and "unhappy endings."

Github Repo Link

Model Lineage & Alignment

This model is a secondary iteration of the original SFT (Supervised Fine-Tuning) checkpoint. The transition from the base model to this version was achieved through an RLHF-based pipeline:

Base Model: Gemma3-270M (SFT Checkpoint)
Tuning Method: Direct Preference Optimization (DPO)
Alignment Goal: To shift the model's stochastic output toward a "Negative Sentiment" persona.
Reference Anchor: The original SFT weights were used as a frozen reference to calculate the log-probability ratio, preventing catastrophic forgetting of the base language distribution.

Architecture Specifications

The model utilizes a custom implementation of the Gemma3 architecture:

Parameters: 270M (18 Transformer Blocks)
Attention: Grouped Query Attention (GQA) with 1 KV group.
Windowing: Sliding Window Attention (SWA) with a 512-token span.
Positional Encoding: Rotary Positional Embeddings (RoPE).
Context Window: 32,768 tokens (trained with 128-token block size).

Training & Hardware

Dataset: Preference-paired subset of TinyStories (Chosen: Negative / Rejected: Positive).
Optimizer: AdamW with Linear Warmup and Cosine Decay.
Hardware: Single NVIDIA A100 GPU (40GB).
Development Context: This project was developed at Tunica Tech as a case study in Small Language Model (SLM) alignment and Reinforcement Learning.

Requirements

pip install git+https://huggingface.co/Shubhamw11/Gemma-270M-TinyStories

How to use

from gemma3_tinystories import HFGemma3DPONegative, Gemma3Config
import tiktoken
import torch

# Load Aligned Model
config = Gemma3Config.from_pretrained("Shubhamw11/gemma-3-270m-dpo-negative")
model = HFGemma3DPONegative.from_pretrained("Shubhamw11/gemma-3-270m-dpo-negative", config=config).model
tokenizer = tiktoken.get_encoding("gpt2")

Generate text

device = "cuda" if torch.cuda.is_available() else "cpu"

input_text = "Once upon a time, there was a little"
context = torch.tensor(tokenizer.encode(input_text), dtype=torch.long).unsqueeze(0).to(device)
model.to(device)
response = model.generate(context, max_new_tokens=200, temperature=1.1, top_k=5)

print(tokenizer.decode(response.squeeze().tolist()))