EPYC 9355 CPU-only sweep-bench

by sousekd - opened Feb 11

Feb 11

People on Reddit sometimes ask about EPYC CPU-only performance; my GPUs are currently out-of-order, so here are CPU-only results from a single Turin 9355 (12x DDR5-6400) running GLM-4.5-Air HQ4_K with ik_llama.cpp:

./llama-sweep-bench \
    --model "$MODEL_PATH" \
    --no-mmap \
    -fa on -amb 512 \
    -b 2048 -ub 1024 \
    -ctk f16 -ctv f16 -c 131072 \
    --threads 20 \
    --threads-batch 30 \
    --warmup-batch \
    -n 128

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
1024	128	0	3.544	288.95	4.404	29.07
1024	128	1024	4.037	253.65	5.061	25.29
1024	128	2048	4.521	226.49	5.113	25.04
...	...	...	...	...	...	...
1024	128	46080	35.672	28.71	22.420	5.71
1024	128	47104	36.476	28.07	28.743	4.45
1024	128	48128	37.586	27.24	22.298	5.74

anikifoss

Owner Feb 12

@sousekd nice graphs, thank you!

The tokens per second graph has distinct pits and then it craters to that level past 40k tokens.
When my graphs looked like that, I had cooling issues caused by DRAM overheating and self-throttling. I put some fans on my RAM, and the graphs are much more smoother. So you may have another 30-35% performance hidden in there to unlock.

sousekd

Feb 12

•

edited Feb 12

@anikifoss Thank you for your insight. I think you might be right about thermal throttling. I do have some fairly aggressive BIOS-driven DRAM frequency throttling set up (on top of the DIMMs’ own internal protections). That said, I re-ran it on a cold machine and watched the memory group temps — they never went above 61 °C, which seems pretty reasonable to me. But who knows what the internal DRAM/junction temps were, or how the chips themselves react to that, irrespective of my config.

./llama-sweep-bench \
    --model "$MODEL_PATH" \
    --no-mmap \
    -fa on -amb 512 \
    -b 2048 -ub 1024 \
    -ctk f16 -ctv f16 -c 65536 \
    --threads 24 \
    --threads-batch 32 \
    --warmup-batch \
    -n 128

Interestingly, I ran another test after this one which looks perfectly fine: https://huggingface.co/ubergarm/GLM-4.7-Flash-GGUF/discussions/6
But it is quite possible that RAM modules were too hot already, so it was throttled from start to finish... or the smaller model simply did not cause enough stress to allow the issue to be noticable.

Anyway, it is not exactly a typical usecase for me - I need to find a way how to make the machine run with the GPUs again 😀

anikifoss

Owner Feb 13

GLM-4.7-Flash is pretty awesome too, but now we have Kimi-K2.5 with vision, and MiniMax-M2.5 just came out!

What happened to your GPU?

sousekd

Feb 13

•

edited Feb 13

GLM-4.7-Flash is pretty awesome too, but now we have Kimi-K2.5 with vision, and MiniMax-M2.5 just came out!

I know! I can’t wait to be able to run them :).

What happened to your GPU?

I’m not sure, yet. The PSU is cutting power with no log entries. The first time it happened suddenly in the middle of work. After that, with the GPUs connected, I couldn’t keep the server running for more than two minutes. Without the GPUs, everything worked fine for a day, even under heavy load - the reason I ran the CPU-only benchmarks - to test it.

So I tried reconnecting the GPUs one by one. Eventually it failed with any of them, and finally it failed even without GPUs.

My primary suspect is overheating on the AUX PCIe 12V connectors. The motherboard design is terrible: the two connectors sit right next to very hot RAM, basically touching the heat spreader. My second suspect is the PSU, obviously… and then the motherboard. But who knows - it could even be a short on one of the 24 DRAM sticks, a pin contacting a heat sink due to thermal paste degradation?

It could be anything. The issue is easier to reproduce with GPUs connected, and it seems to work a bit better when everything is cold.
So for now I’m focusing on the connectors. We’ll see how it goes.

Cannot wait to be up and running again :).

anikifoss

Owner Feb 14

Cannot wait to be up and running again :).

Yeah, it sucks deling with hardware failures. Hope you'll figure it out soon!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment