- IQ4F : IQ4_XS feed-forawrd (IQ4_NL for ffn_down due to shape constraints)
- Q8A : Q8_0 attention, Q8_0 output, Q8_0 embeds
- Q8SH : Q8_0 shared experts
Readable speeds on a 24GiB GPU + 64GB RAM w/ long context
- Downloads last month
- 14
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support