Instructions to use geodesic-research/nemotron-think-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use geodesic-research/nemotron-think-tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("geodesic-research/nemotron-think-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Nemotron Think Tokenizer
A byte-identical mirror of the nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 tokenizer, hosted under the geodesic-research namespace for stable referencing in our reasoning / thinking SFT pipelines. No modifications.
Why mirror?
The upstream NVIDIA tokenizer ships a chat template that supports <think>...</think> reasoning traces β this is the right tool for any model trained with reasoning data. We host an unmodified copy so:
- Our training configs can reference a stable
geodesic-research/*path that won't shift if NVIDIA re-tags the upstream repo. - It pairs cleanly with
geodesic-research/nemotron-instruct-tokenizer: one is the reasoning variant, the other strips think-tag injection. Both share the same encoder. - A single naming convention (
nemotron-think-*vsnemotron-instruct-*) makes it explicit at the config level which behavior a training run expects.
Contents
| File | sha256 | Source |
|---|---|---|
tokenizer.json |
623c34567aebb18582765289fbe23d901c62704d6518d71866e0e58db892b5b7 |
upstream Super 120B BF16, verbatim |
tokenizer_config.json |
matches upstream | upstream Super 120B BF16, verbatim |
special_tokens_map.json |
matches upstream | upstream Super 120B BF16, verbatim |
chat_template.jinja |
575fb74f54ed264df9047d0ecce3c98938aae953fb4f50356675706264cbb68a (10771 B) |
upstream Super 120B BF16, verbatim |
The tokenizer.json blob is also byte-identical to nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16, nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16, and nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16 β the entire Nemotron 3 family shares one encoder.
Chat template behavior
This is the upstream Nemotron 3 reasoning template. Default behavior:
enable_thinkingdefaults toTrue. The generation prompt ends at<|im_start|>assistant\n<think>\nto elicit a reasoning trace.<think></think>is auto-prepended to assistant messages whose content lacks think tags β so{"role": "assistant", "content": "42"}renders as<|im_start|>assistant\n<think></think>42<|im_end|>.reasoning_contentfield is supported. A message like{"role": "assistant", "reasoning_content": "let me check", "content": "42"}renders as<|im_start|>assistant\n<think>\nlet me check\n</think>\n42<|im_end|>.truncate_history_thinking=Trueby default. Older assistant turns have their reasoning traces stripped and replaced with<think></think>stubs, keeping only the final answer in context.low_effort=Falseby default. When set toTrue, appends\n\n{reasoning effort: low}to the last user message as a hint to the model to produce shorter chains of thought.
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("geodesic-research/nemotron-think-tokenizer")
msgs = [
{"role": "user", "content": "What's 2+2?"},
{"role": "assistant", "content": "4."}, # auto-prepended with <think></think>
{"role": "user", "content": "And 3+3?"},
]
print(tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True))
# Ends with: ...<|im_start|>assistant\n<think>\n
When to use this tokenizer
| Use case | Use this tokenizer? |
|---|---|
Reasoning / thinking SFT (training data has <think>...</think> traces) |
β Yes |
| Distillation from a reasoning teacher model | β Yes |
| Instruct SFT with no reasoning | β Use geodesic-research/nemotron-instruct-tokenizer instead β avoids stray </think> echoes at inference |
| Continued pretraining (CPT) on raw text | Either works β chat template is irrelevant for .bin/.idx data |
| Evaluating a reasoning-trained model with vLLM | β Yes |
| Evaluating an instruct (non-reasoning) model | β Use the instruct variant β this template emits <think>\n on the generation prompt, which mismatches an instruct model's training distribution |
Compatibility
- vLLM: works out of the box.
tokenizer_class: PreTrainedTokenizerFast, nobackend/is_localkeys, no custom Python files. Compatible with transformers 4.57.x and 5.x. - HuggingFace generation: standard
generate()works;<|im_end|>is registered as the eos_token. - Existing Nemotron checkpoints: vocab, merges, special tokens, and added-token IDs all match the entire Nemotron 3 family. Drop-in replacement at the encoder level.
Provenance
- Source:
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16(revision49ad1f46ee9df444a0a3b8b63520faa1ca66324a) - Modifications: none
- License: NVIDIA Open Model License (inherited from upstream)
- Sibling:
geodesic-research/nemotron-instruct-tokenizerβ same encoder, chat template stripped of<think>injection