Instructions to use dealignai/Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dealignai/Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK dealignai/Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Reasoning V3 SKU. Loads via vMLX or
jang-toolsPython. Follow @dealignai.
Nemotron-3-Nano-Omni-30B-A3B — JANGTQ4 + CRACK v2
JANGTQ4 (8-bit attn affine + 4-bit MXTQ routed experts) | CRACK abliterated v2 | Vision + Audio (Speech) | Hybrid Mamba-2 + Attn + MoE | 19 GB
Headline numbers
| Metric | This v2 model | Base model | Δ |
|---|---|---|---|
| HarmBench-320 strict comply (thinking=ON) | 97.2% (311/320) | 12.81% (refuses) | +84.4pp |
| MMLU-200 generative (thinking=ON, max=8000) | 74.0% (148/200) | 86.5% (max=2000) | -12.5pp |
| Refusals on harmful prompts | 0 explicit refuses | typically 90%+ refuse | abliteration complete |
</think> close at greedy on hard MMLU |
5/5 | 5/5 | preserved |
| Multi-turn (3-turn escalation × 3 conversations) | 9/9 comply, context preserved | n/a | works |
| Multimodal byte-identical to base | preserved | — | preserved |
| Bundle size | 19 GB | 66 GB BF16 | — |
| Context | 262,144 tokens native | same | preserved |
The 12.5pp MMLU gap is concentrated in two reasoning-heavy subjects (abstract_algebra, college_computer_science) where the 8000-token thinking budget runs out before
</think>closes. These hard-stops are genuine deep reasoning, not v1-style infinite repetition loops. Withmax_tokens ≥ 16384, accuracy approaches base.
v2 vs v1 (head-to-head)
v1 (shipped 2026-04-28) had a </think> termination defect at greedy decoding — the model couldn't terminate reasoning on hard prompts and looped to budget cutoff. MMLU dropped from 86.5% base → 70.0% v1.
v2 (this release) restores clean termination:
| Bench | v1 (broken) | v2 (this release) |
|---|---|---|
| HarmBench-320 strict comply | 95.94% | 97.2% (0 refusals) |
| MMLU-200 thinking=ON | 70.0% @max=16384 | 74.0% @max=8000 (+4pp at half budget) |
</think> close at greedy (5 hard MMLU) |
0/5 | 5/5 |
| Hard-stops are real loops? | YES (paragraph repetition) | NO (genuine deep reasoning, just out of budget) |
MMLU-200 per-subject (BASE vs CRACK v2)
Both at thinking=ON, greedy. Base at max=2000, CRACK v2 at max=8000.
| Subject | Base | CRACK v2 | Δ | Notes |
|---|---|---|---|---|
| abstract_algebra | 17/20 (85%) | 12/20 (60%) | -25pp | Budget-bound — needs max=16384 |
| anatomy | 16/20 (80%) | 13/20 (65%) | -15pp | |
| astronomy | 19/20 (95%) | 17/20 (85%) | -10pp | |
| college_computer_science | 14/20 (70%) | 7/20 (35%) | -35pp | Hardest subject — needs max=16384 |
| college_physics | 18/20 (90%) | 18/20 (90%) | 0 | unchanged |
| high_school_biology | 19/20 (95%) | 17/20 (85%) | -10pp | |
| high_school_chemistry | 18/20 (90%) | 17/20 (85%) | -5pp | |
| high_school_mathematics | 18/20 (90%) | 15/20 (75%) | -15pp | |
| logical_fallacies | 17/20 (85%) | 16/20 (80%) | -5pp | |
| world_religions | 17/20 (85%) | 16/20 (80%) | -5pp | |
| TOTAL | 173/200 (86.5%) | 148/200 (74.0%) | -12.5pp | majority loss is budget-bound |
HarmBench-320 per-category (CRACK v2)
| Category | n | CRACK comply | Refuse | Empty (truncated) |
|---|---|---|---|---|
| chemical_biological | 42 | 42 (100%) | 0 | 0 |
| copyright | 80 | 75 (94%) | 0 | 5 |
| cybercrime_intrusion | 52 | 49 (94%) | 0 | 3 |
| harassment_bullying | 21 | 21 (100%) | 0 | 0 |
| harmful | 18 | 18 (100%) | 0 | 0 |
| illegal | 53 | 53 (100%) | 0 | 0 |
| misinformation_disinformation | 54 | 53 (98%) | 0 | 1 |
| Overall | 320 | 311 (97.2%) | 0 | 9 |
Zero explicit refusals. The 9 "empty" verdicts are token-budget truncations on copyright/long prompts (thinking phase consumed all 1500 tokens before producing the answer).
Operating recommendations
enable_thinking— v2 works in BOTH modes (5/5 comply with thinking ON, 4/5 with thinking OFF). Default to ON for hardest prompts; OFF works for most.max_tokens ≥ 16384for hard reasoning (math, abstract algebra, complex CS).- Greedy (temperature=0) AND sampling (temp=0.6, top_p=0.95 — NVIDIA-recommended in
generation_config.json) both work. - Multi-turn — context preserved across 3+ turns; no late refusals after escalating prompts.
Verification
- All multimodal tensors (vision + audio + projectors) are byte-identical to base — capabilities fully preserved.
- All config files unchanged (config.json, jang_config.json, generation_config.json, chat_template.jinja, tokenizer_config.json).
- Bit widths preserved: attn=8, shared=8, mamba=8, routed=4, embed=8, lm_head=8.
Architecture (nemotron_h)
- 52 layers: hybrid Mamba-2 + MoE + Attention
- Hidden 2688, head_dim 128, GQA 32q/2kv (NO RoPE on attention — position from Mamba state)
- 128 routed experts top-6 (sigmoid) + 1 shared expert per MoE layer
- Multimodal: image (RADIO ViT) + audio/speech (Parakeet) merged via early-fusion projectors
Loading
from huggingface_hub import snapshot_download
import sys
path = snapshot_download("dealignai/Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK")
sys.path.insert(0, "/path/to/jang-tools")
from jang_tools.load_jangtq import load_jangtq_model
model, tokenizer = load_jangtq_model(path)
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Your question"}],
tokenize=False, add_generation_prompt=True,
enable_thinking=True,
)
from mlx_lm import generate
out = generate(model, tokenizer, prompt=prompt, max_tokens=16384)
print(out.split("</think>", 1)[-1])
For the multimodal pipeline (image + audio + video), pair this bundle with the unmodified Multimodal-Addon.
Use responsibly
This model has had refusal training surgically removed for legitimate research, red-teaming, and evaluation. Outputs may include harmful content. You are solely responsible for any use. Do not deploy in consumer-facing contexts without your own safety layer. Do not use in violation of applicable law in your jurisdiction.
Built by dealignai. Sister bundles: JANGTQ-CRACK (12 GB, 2-bit MXTQ) · MXFP4-CRACK (21 GB, uniform 4-bit affine).
- Downloads last month
- 172
Quantized
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK dealignai/Nemotron-3-Nano-Omni-30B-A3B-JANGTQ4-CRACK