issue with "gemma-3-12b-it-Q6_K.gguf" and maybe all of the clip ggufs

#13

by mdkb - opened Jan 20

•

EDIT OF THE EDIT: I cant delete this comment. but I still take it back, this was not the cause. Add in the distilled lora strength -0.3 solves the issue even with GGUF model in the main model and this GGUF in the clip.

It looks like it might be at least partially responsible for this "frozen image slow zoom in" problem when making video using LTX-2 with audio file instead of a prompt. There are ways to tweak and force it to work, but many people using the GGUFs end up with this issue. When I swapped it out for the fp8 in clip, lipsync works perfectly from an audio. It might be compounded by other things but it was definitely involved in the problem.

EDIT: I am taking this back. sorry. I just tested furhter and it was the distill lora not this GGUF clip so will delete this comment after posting this edit.

doublemathew

Jan 20

Did you download the corresponding mmproj file and name it appropriately?

https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/blob/main/mmproj-BF16.gguf

If you download that and make sure the prefix the base model name so that it looks like gemma-3-12b-it-mmproj-BF16.gguf, it will automatically load along with the rest of the text encoder as part of the CLIP GGUF Loader.

mdkb

Jan 21

Did you download the corresponding mmproj file and name it appropriately?

https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/blob/main/mmproj-BF16.gguf

If you download that and make sure the prefix the base model name so that it looks like gemma-3-12b-it-mmproj-BF16.gguf, it will automatically load along with the rest of the text encoder as part of the CLIP GGUF Loader.

I didnt do that, but will, thanks.

mdkb changed discussion status to closed Jan 21

doublemathew

Jan 21

Oh I also realized, this repo is the wrong one. LTX2 uses a qat checkpoint, so unsloth/gemma-3-12b-it-qat-GGUF is the correct repo to use.

check https://huggingface.co/unsloth/LTX-2-GGUF/discussions/7 for workflow reference.

pasted below for convenience.

The GGUF's for LTX2 require a few more extra components to be loaded since the GGUF's don't have the vae's and embedding connectors packaged in the transformer model.

You also need to install two custom node packages:
https://github.com/city96/ComfyUI-GGUF
https://github.com/kijai/ComfyUI-KJNodes

Navigate to your ComfyUI model folder and run the following to download all the model weights:

# Can try any quant type 
ln -s "$(hf download unsloth/LTX-2-GGUF ltx-2-19b-dev-UD-Q2_K_XL.gguf --quiet)" unet/ltx-2-19b-dev-UD-Q2_K_XL.gguf
ln -s "$(hf download unsloth/LTX-2-GGUF vae/ltx-2-19b-dev_audio_vae.safetensors --quiet)" vae/ltx-2-19b-dev_audio_vae.safetensors
ln -s "$(hf download unsloth/LTX-2-GGUF vae/ltx-2-19b-dev_video_vae.safetensors --quiet)" vae/ltx-2-19b-dev_video_vae.safetensors

# Can try any quant type 
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --quiet)" text_encoders/gemma-3-12b-it-qat-UD-Q4_K_XL.gguf
ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF mmproj-BF16.gguf --quiet)" text_encoders/gemma-3-12b-it-qat-mmproj-BF16.gguf
ln -s "$(hf download unsloth/LTX-2-GGUF text_encoders/ltx-2-19b-dev_embeddings_connectors.safetensors --quiet)" text_encoders/ltx-2-19b-dev_embeddings_connectors.safetensors
ln -s "$(hf download Lightricks/LTX-2 ltx-2-19b-distilled-lora-384.safetensors --quiet)" loras/ltx-2-19b-distilled-lora-384.safetensors
ln -s "$(hf download Lightricks/LTX-2 ltx-2-spatial-upscaler-x2-1.0.safetensors --quiet)" latent_upscale_models/ltx-2-spatial-upscaler-x2-1.0.safetensors

# Optional
ln -s "$(hf download Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-Left ltx-2-19b-lora-camera-control-dolly-left.safetensors --quiet)" loras/ltx-2-19b-lora-camera-control-dolly-left.safetensors

This mp4 should have the ltx2 workflow used to generate the mp4 embedded in it, which references the models downloaded above.

mdkb

Jan 21

lol that would explain the weight error messages but tbh it has been working though I started switching back to the fp8 version for LTX anyway. It's also possible I didnt even download it from here since I just noticed it is 9 months old uploads I had followed the trail back here by a search showing I had visited this link before and didnt bother to check if they existed prior to LTX. but its been enlightening anyway so thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment