RippleGPT-Nano / README.md
Tavernari's picture
Upload folder using huggingface_hub
a224b8a verified
|
raw
history blame
2.62 kB
metadata
license: apache-2.0
library_name: pytorch
tags:
  - sequence-modeling
  - physics-inspired
  - ripple-attention
  - causal-lm
  - pytorch

RippleGPT: Physics-Inspired Language Modeling 🌊

RippleGPT is a novel Transformer architecture that replaces learned positional embeddings with a Decay-Biased Attention Mechanism (Ripple Field) and utilizes Multiplicative Gating (RippleMLP) for improved signal flow.

Comparison License

πŸ§ͺ The Scientific Breakthrough

Standard Transformers rely on absolute positional embeddings, which limits their ability to generalize to sequence lengths longer than those seen during training.

RippleGPT solves this via physics:

  1. Ripple Attention: Treats token influence as a magnetic field that decays with distance ($1/d$). This allows Length Extrapolation (training on 256 tokens, inference on 1024+).
  2. Ripple MLP: Replaces standard ReLU activations with Gated Multiplicative interactions, improving gradient flow in deep networks.

πŸ“Š Performance (War and Peace Dataset)

In controlled iso-parameter tests (~9.9M params), RippleGPT converges faster and achieves lower loss than standard GPT-2 architectures.

Training Loss Curve

Model Parameters Val Loss Extrapolation
Standard GPT ~9.9M 1.29 ❌ Fails
RippleGPT ~8.1M 1.20 βœ… Works

Note: RippleGPT achieves better performance with ~18% fewer parameters.

πŸš€ Quick Start

import torch
from src.model import RippleGPT, RippleConfig

# 1. Initialize
config = RippleConfig(vocab_size=65, block_size=256, n_layer=6, n_head=6, n_embd=384)
model = RippleGPT(config)

# 2. Inference (Works on lengths > 256!)
idx = torch.zeros((1, 1), dtype=torch.long) # Start token
generated = model.generate(idx, max_new_tokens=500)

πŸ“‚ Repository Structure

  • src/model.py: The core architecture (RippleHead, RippleMLP).
  • src/config.py: Configuration dataclass.
  • train.py: Training script for Causal Language Modeling.

πŸ“œ Citation

If you find this architecture useful, please cite this repository.

@misc{tavernari2026ripplegpt,
  author       = {Tavernari, Victor Carvalho},
  title        = {RippleGPT: High-Efficiency Sequence Modeling via Decay-Biased Attention},
  year         = {2026},
  howpublished = {\url{https://github.com/Tavernari/RippleGPT}},
  publisher    = {GitHub},
  note         = {GitHub repository}
}