- Q4F : Q4_K feed-forawrd (Q5_1 for ffn_down due to shape constraints)
- Q8A : Q8_0 attention, Q8_0 output, Q8_0 embeds
- Q8SH : Q8_0 shared experts
Readable speeds on a 24GiB GPU + 64GB RAM
- Downloads last month
- 5
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support