Nemotron Think Tokenizer

A byte-identical mirror of the nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 tokenizer, hosted under the geodesic-research namespace for stable referencing in our reasoning / thinking SFT pipelines. No modifications.

Why mirror?

The upstream NVIDIA tokenizer ships a chat template that supports <think>...</think> reasoning traces — this is the right tool for any model trained with reasoning data. We host an unmodified copy so:

Our training configs can reference a stable geodesic-research/* path that won't shift if NVIDIA re-tags the upstream repo.
It pairs cleanly with geodesic-research/nemotron-instruct-tokenizer: one is the reasoning variant, the other strips think-tag injection. Both share the same encoder.
A single naming convention (nemotron-think-* vs nemotron-instruct-*) makes it explicit at the config level which behavior a training run expects.

File	sha256	Source
`tokenizer.json`	`623c34567aebb18582765289fbe23d901c62704d6518d71866e0e58db892b5b7`	upstream Super 120B BF16, verbatim
`tokenizer_config.json`	matches upstream	upstream Super 120B BF16, verbatim
`special_tokens_map.json`	matches upstream	upstream Super 120B BF16, verbatim
`chat_template.jinja`	`575fb74f54ed264df9047d0ecce3c98938aae953fb4f50356675706264cbb68a` (10771 B)	upstream Super 120B BF16, verbatim

The tokenizer.json blob is also byte-identical to nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16, nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16, and nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16 — the entire Nemotron 3 family shares one encoder.

Chat template behavior

This is the upstream Nemotron 3 reasoning template. Default behavior:

enable_thinking defaults to True. The generation prompt ends at <|im_start|>assistant\n<think>\n to elicit a reasoning trace.
<think></think> is auto-prepended to assistant messages whose content lacks think tags — so {"role": "assistant", "content": "42"} renders as <|im_start|>assistant\n<think></think>42<|im_end|>.
reasoning_content field is supported. A message like {"role": "assistant", "reasoning_content": "let me check", "content": "42"} renders as <|im_start|>assistant\n<think>\nlet me check\n</think>\n42<|im_end|>.
truncate_history_thinking=True by default. Older assistant turns have their reasoning traces stripped and replaced with <think></think> stubs, keeping only the final answer in context.
low_effort=False by default. When set to True, appends \n\n{reasoning effort: low} to the last user message as a hint to the model to produce shorter chains of thought.

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("geodesic-research/nemotron-think-tokenizer")

msgs = [
    {"role": "user", "content": "What's 2+2?"},
    {"role": "assistant", "content": "4."},  # auto-prepended with <think></think>
    {"role": "user", "content": "And 3+3?"},
]
print(tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True))
# Ends with: ...<|im_start|>assistant\n<think>\n

When to use this tokenizer

Use case	Use this tokenizer?
Reasoning / thinking SFT (training data has `<think>...</think>` traces)	✅ Yes
Distillation from a reasoning teacher model	✅ Yes
Instruct SFT with no reasoning	❌ Use `geodesic-research/nemotron-instruct-tokenizer` instead — avoids stray `</think>` echoes at inference
Continued pretraining (CPT) on raw text	Either works — chat template is irrelevant for `.bin/.idx` data
Evaluating a reasoning-trained model with vLLM	✅ Yes
Evaluating an instruct (non-reasoning) model	❌ Use the instruct variant — this template emits `<think>\n` on the generation prompt, which mismatches an instruct model's training distribution

Compatibility

vLLM: works out of the box. tokenizer_class: PreTrainedTokenizerFast, no backend/is_local keys, no custom Python files. Compatible with transformers 4.57.x and 5.x.
HuggingFace generation: standard generate() works; <|im_end|> is registered as the eos_token.
Existing Nemotron checkpoints: vocab, merges, special tokens, and added-token IDs all match the entire Nemotron 3 family. Drop-in replacement at the encoder level.

Provenance

Source: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 (revision 49ad1f46ee9df444a0a3b8b63520faa1ca66324a)
Modifications: none
License: NVIDIA Open Model License (inherited from upstream)
Sibling: geodesic-research/nemotron-instruct-tokenizer — same encoder, chat template stripped of <think> injection

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for geodesic-research/nemotron-think-tokenizer

Base model

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Finetuned

(17)

this model

Collection including geodesic-research/nemotron-think-tokenizer

Nemotron 3 Custom Tokenizers

Collection

3 items • Updated Apr 27

geodesic-research
/

nemotron-think-tokenizer