Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
conversational
custom_code
8-bit precision

Corrupted Weight Shards (6–10), Shards 6-10 are 40-byte, are currently "ghost"

#7
by CRY24180339 - opened

The main issue with the NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 repository is a hydration failure within the Xet storage system.

Here is a concise breakdown of the technical faults for your issue report:

  1. Corrupted Weight Shards (6–10)

    The Symptom: Shards model-00006-of-00010.safetensors through model-00010-of-00010.safetensors are served as tiny 40-byte text files rather than multi-gigabyte binary weights.

    The Cause: These files are currently "ghost" pointers used by NVIDIA's Xet storage architecture. On this specific repository branch, the backend is failing to "hydrate" or materialize these pointers into actual tensors during download.

  2. Broken Safetensors Headers

    The Symptom: Attempts to load the shards result in Error while deserializing header: header too large.

    The Cause: A valid Safetensor file must begin with an 8-byte integer defining the metadata header length. Because these files contain Git LFS/Xet pointer text (version https://git-lfs...), the loader misinterprets the text as a massive header size and crashes.

  3. Repository Incompatibility

    Environment Failure: Standard tools like huggingface-cli, hf_hub_download, and even huggingface_hub[xet] extension are failing to resolve these specific pointers into real math.

    Blocker: This prevents TensorRT-LLM from building an engine, as the weight map cannot be verified without readable shard headers.

Suggested Issue Title: Shards 6-10 are 40-byte Xet pointers; "header too large" error on load

Concise Description: > Shards 6 through 10 of the NVFP4 model are currently stuck as 40-byte Xet pointer files. Standard materialization via huggingface_hub[xet] fails to hydrate these into valid Safetensors. This results in a deserializing header: header too large error, making it impossible to build a TensorRT-LLM engine or run inference on Blackwell hardware.

NVIDIA org

Thanks for raising this issue. The HF artifacts are just updated, please let us know if you still face a build issue.

bkartal changed discussion status to closed

Sign up or log in to comment