Image-Text-to-Text
Transformers
Safetensors
English
gemma4
gemma
google
Mixture of Experts
mixture-of-experts
transformer
26b
nvfp4
fp4
4-bit precision
quantized
modelopt
weight-quantization
uncensored
abliterated
unfiltered
refusal-removed
vision
multimodal
text-generation
tool-calling
function-calling
reasoning
thinking
chat
instruct
agentic
coding
creative-writing
dgx-spark
blackwell
gb10
grace-blackwell
nvidia
gpu
vllm
openai-api
openai-compatible
fp8-kv-cache
prefix-caching
chunked-prefill
sliding-window-attention
english
production-ready
conversational
Scaling with concurrency?
#1
by JDWarner - opened
You did a very nice test on https://huggingface.co/AEON-7/Gemma-4-26B-A4B-it-Uncensored-NVFP4 regarding aggregate scaling on GB10 - this model seems similar but SUPERgemma claims higher throughput and overall performance. The Uncensored model is not as well benchmarked/compared to the original as this one's base model.
I'm curious if the higher performance scales or hits a lower ceiling. Could you test this one too?