nemotron3-nano-30b-a3b-spiral-step130 (LoRA)

LoRA adapter trained with the SPIRAL self-play RL framework on top of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.

Training

Base model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (30B-total / 3B-active MoE, reasoning-capable)
Renderer: nemotron3 (thinking enabled)
Environments: TicTacToe-v0, KuhnPoker-v1, SimpleNegotiation-v1 (self-play, role-conditioned advantage estimation / RAE)
LoRA rank: 64 (target_modules=all-linear, alpha 32)
Batch size: 128 self-play games
Max tokens per turn: 4096
Learning rate: 4e-5
Checkpointed at: training step 130 (of planned 400)
Training backend: Tinker (LoRA fine-tuning API from Thinking Machines Lab)

Math benchmark results (step-130 vs base)

Benchmark	Base	Step-130	Δ
AIME24	36.7%	36.7%	0.0
AMC23	67.1%	74.4%	+7.3
MATH500	89.0%	90.8%	+1.8
Minerva	29.4%	30.1%	+0.7
Olympiad-Bench	50.1%	53.2%	+3.1
Average	54.5%	57.0%	+2.5

All evals done with nemotron3 renderer (thinking enabled), max_tokens 8192, full test sets, unified \boxed{} answer extraction.

Load

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model = AutoPeftModelForCausalLM.from_pretrained("maxbittker/nemotron3-nano-30b-a3b-spiral-step130",
                                                  device_map="auto",
                                                  torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")

Or merge and save as a full model:

merged = model.merge_and_unload()
merged.save_pretrained("./nemotron3-spiral-step130-merged")

Status

Training is ongoing — further checkpoints will land at step200, step300, step400.

Downloads last month: 8

Video Preview

Reinforcement Learning

Model tree for maxbittker/nemotron3-nano-30b-a3b-spiral-step130

Base model

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Adapter

(75)

this model

Paper for maxbittker/nemotron3-nano-30b-a3b-spiral-step130

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30, 2025 • 51