• IQ4F : IQ4_XS feed-forawrd (IQ4_NL for ffn_down due to shape constraints)
  • Q8A : Q8_0 attention, Q8_0 output, Q8_0 embeds
  • Q8SH : Q8_0 shared experts

Readable speeds on a 24GiB GPU + 64GB RAM w/ long context

Downloads last month
14
GGUF
Model size
110B params
Architecture
glm4moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Beinsezii/GLM-4.5-Air-Derestricted-IQ4F-Q8A-Q8SH-GGUF

Quantized
(23)
this model