nemotron3-nano-30b-a3b-spiral-step130 (LoRA)

LoRA adapter trained with the SPIRAL self-play RL framework on top of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.

Training

  • Base model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (30B-total / 3B-active MoE, reasoning-capable)
  • Renderer: nemotron3 (thinking enabled)
  • Environments: TicTacToe-v0, KuhnPoker-v1, SimpleNegotiation-v1 (self-play, role-conditioned advantage estimation / RAE)
  • LoRA rank: 64 (target_modules=all-linear, alpha 32)
  • Batch size: 128 self-play games
  • Max tokens per turn: 4096
  • Learning rate: 4e-5
  • Checkpointed at: training step 130 (of planned 400)
  • Training backend: Tinker (LoRA fine-tuning API from Thinking Machines Lab)

Math benchmark results (step-130 vs base)

Benchmark Base Step-130 Δ
AIME24 36.7% 36.7% 0.0
AMC23 67.1% 74.4% +7.3
MATH500 89.0% 90.8% +1.8
Minerva 29.4% 30.1% +0.7
Olympiad-Bench 50.1% 53.2% +3.1
Average 54.5% 57.0% +2.5

All evals done with nemotron3 renderer (thinking enabled), max_tokens 8192, full test sets, unified \boxed{} answer extraction.

Load

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model = AutoPeftModelForCausalLM.from_pretrained("maxbittker/nemotron3-nano-30b-a3b-spiral-step130",
                                                  device_map="auto",
                                                  torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")

Or merge and save as a full model:

merged = model.merge_and_unload()
merged.save_pretrained("./nemotron3-spiral-step130-merged")

Status

Training is ongoing — further checkpoints will land at step200, step300, step400.

Downloads last month
8
Video Preview
loading

Model tree for maxbittker/nemotron3-nano-30b-a3b-spiral-step130

Adapter
(75)
this model

Paper for maxbittker/nemotron3-nano-30b-a3b-spiral-step130