Update query_clarification gguf
#29
by kndtran - opened
No description provided.
The original query_clarification/granite4_micro/lora/Lora-q8_0.gguf was only 288 bytes — it contained the GGUF header but zero tensor data, due to a bug in llama.cpp's convert_lora_to_gguf.py where LoraTorchTensor is missing a split() method. The Granite model handler calls split() when processing fused FFN weights (shared_mlp.input_linear.weight), which the other intrinsics don't have LoRA weights on.
Regenerated from the source safetensors with a patched converter. The new file is 66 MB with 560 tensors (vs 14-15 MB for the other intrinsics, because query_clarification trains on additional FFN layers). Tested and confirmed working.
kndtran changed pull request status to open
frreiss changed pull request status to merged