这个版本对于5090单卡来说还是太大了

by iwaitu - opened 13 days ago

Discussion

iwaitu

13 days ago

nvfp4版本32gb对于5090单卡来说，还是太大了，能不能将更多的网络转为nvfp4实现5090单卡推理？

alexcardo

13 days ago

nvfp4版本32gb对于5090单卡来说，还是太大了，能不能将更多的网络转为nvfp4实现5090单卡推理？

Same question )

AlexR7

13 days ago

Same, its barely as large as fp8, what s the point of this nvfp4 quantization?
确实，这个版本差不多和fp8量化版本一样大了，nvidia在搞什么？只准备给RTX Pro 6000用吗？

windane

13 days ago

I thought I could defy the odds, but I failed after all.

less0852

12 days ago

DGX spark fail too

windane

12 days ago

I thought I could defy the odds, but I failed after all.

Load success through --cpu-offload-gb 10, but the speed... [sad]

Gratham

12 days ago

DGX spark fail too

It didnt work on the Spark?

Tugay31

12 days ago

The model size was intentionally set so that it wouldn't work well with 32GB GPUs. Perhaps they want to sell more RTX Pro 6000s.

sudo-0x2a

11 days ago

这个quant有问题，量化了个寂寞

rividevano19

11 days ago

DGX spark fail too

It didnt work on the Spark?

I ran it on dgx spark, and it works,

my system boots with init 3 so it is in headless mode.
use docker for

docker run --runtime nvidia --gpus all -it  --rm -d --env "HF_HUB_OFFLINE=1"  \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:gemma4-cu130 --model nvidia/Gemma-4-31B-IT-NVFP4 --quantization modelopt --enable-auto-tool-choice --tool-call-parser gemma4 --reasoning-parser gemma4

the model takes all the mem though.

I dont have quantitative results but the outputs on images about complex procedural graphs or scientific images work, and the details (eg numbers) from the images are accurate.

Currently, the included tool call parser (docker image 9afe08ebfa30) is not right and may format the args for tools incorrectly.

iwaitu

9 days ago

vllm-openai:gemm4-cu130 这个镜像是存在问题的，我在2 x h100 上部署，尝试批量调用api 时会导致显卡掉驱动，但是用 vllm-openai:latest-cu130 部署qwen3.5-122b-a10b 或者 qwen3.5-27b 都是正常通过测试的

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment