Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
nemotron-3
latent-moe
mtp
conversational
custom_code
compressed-tensors

Error loading the model

#1
by cpatonn - opened
cyankiwi org

Hello, this model does not support --tesor-parallel-size > 2, so please use pipeline-parallel-size together with tesor-parallel-size to avoid model loading error.

In addition, MTP layers are implemented and can be invoked using the flag --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}', but MTP layers cannot be used together with pipeline parallelism.

Thanks for releasing the quant version <3

Thank you!! :D

Sign up or log in to comment