GGUF/llama.cpp support
#1
by tcpmux - opened
Would be awesome!
It may already be supported since it's just llama architecture. There are GGUF of the base model uploaded. As long as it doesn't mirror/echo from the instruction tuning should be a good one.
"It may already be supported since it's just llama architecture"
Sadly it's not. It can be converted and quantized, but the corresponding file is not properly accepted by llama.cpp and instead crashes with errors when loading.
Did it run if you change metadata to one of the qwens? I didn't look deeply at the whole architecture or if they truly did anything new.