AMD Ryzen AI Max+ 395 Strix Halo
Quantized models benchmarked with Windows ROCm llama.cpp builds from Lemonade using recommended settings from Unsloth.
Text Generation • 1B • Updated • 4.98k • 38Note 145 t/s @ Q8_0. Surprisingly capable in chat. Not usable in OpenCode.
ggml-org/gpt-oss-20b-GGUF
21B • Updated • 85.6k • 142Note 60 t/s @ MXFP4. OpenCode tools work. Prefer 120B.
mradermacher/Nanbeige4.1-3B-GGUF
4B • Updated • 2.63k • 37Note 51 t/s @ Q8_0. Thinks for minutes. Not usable in OpenCode.
unsloth/GLM-4.7-Flash-GGUF
Text Generation • 30B • Updated • 127k • 601Note 45 t/s @ Q8_0. OpenCode tool calling works great. Made a nice looking 400-line OpenMeteo weather app with typeahead search. Required manual TypeScript error fixes to run. Note that the smaller REAP model wasn't faster.
bartowski/moonshotai_Kimi-Linear-48B-A3B-Instruct-GGUF
Text Generation • 49B • Updated • 3.46k • 19Note 45 t/s @ Q8_0. Excellent OpenCode tool calling including todo list and ask question. Made a 600-line OpenMeteo weather app with no errors. Note that it did everything the frontend-design skill said NOT to do, resulting in a comically bad looking app. Still, most usable local model on this list.
unsloth/Qwen3.5-35B-A3B-GGUF
Image-Text-to-Text • 35B • Updated • 1.34M • 818Note TBD
Intel/Qwen3.5-122B-A10B-gguf-q2ks-mixed-AutoRound
122B • Updated • 871 • 5Note TBD
ggml-org/gpt-oss-120b-GGUF
117B • Updated • 341k • 71Note 42 t/s @ MXFP4. Good OpenCode tool calling, writes working TypeScript, but even the frontend-design skill can't get it to make attractive websites. Feels like GPT-4o, which is nice for nostalgia.
unsloth/Qwen3-Coder-Next-GGUF
Text Generation • 80B • Updated • 207k • 578Note 32 t/s @ Q8_0. Was not able to build a working OpenMeteo weather app. Struggled with the edit tool attempting to fix errors. Was not able to properly trace errors in the code.
Intel/MiniMax-M2-REAP-172B-A10B-gguf-q2ks-mixed-AutoRound
173B • Updated • 164 • 13Note 26 t/s @ Q2_K_S. Good chat performance. Didn't try in OpenCode. Interesting quantization.
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF
Image-Text-to-Text • 9B • Updated • 263k • 291Note 22 t/s @ Q8_0. Perfect tool calling in OpenCode. Fetched the OpenMeteo API schema from GitHub, initialized a Vite project, and made a multi-file React SPA with no errors. Outputs a stray closing /think tag after every response.
unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
24B • Updated • 35.6k • 127Note 9 t/s @ Q8_0. All dense models are slow on Strix Halo. Speculative decoding (ngram-mod) works very well when it kicks in.
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF
Image-Text-to-Text • 27B • Updated • 398k • 567Note 8/ts @ Q8_0. Too slow to try OpenCode. Notably didn't overthink on anything. Otherwise, same comments as Devstral 24B.