Scaling with concurrency?

by JDWarner - opened about 9 hours ago

•

You did a very nice test on https://huggingface.co/AEON-7/Gemma-4-26B-A4B-it-Uncensored-NVFP4 regarding aggregate scaling on GB10 - this model seems similar but SUPERgemma claims higher throughput and overall performance. The Uncensored model is not as well benchmarked/compared to the original as this one's base model.

I'm curious if the higher performance scales or hits a lower ceiling. Could you test this one too?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment