Scaling with concurrency?

#1
by JDWarner - opened

You did a very nice test on https://huggingface.co/AEON-7/Gemma-4-26B-A4B-it-Uncensored-NVFP4 regarding aggregate scaling on GB10 - this model seems similar but SUPERgemma claims higher throughput and overall performance. The Uncensored model is not as well benchmarked/compared to the original as this one's base model.

I'm curious if the higher performance scales or hits a lower ceiling. Could you test this one too?

Sign up or log in to comment