Hippotes
/

Qwen-Image-2512-nvfp4

Model card Files Files and versions

Qwen-Image-2512-nvfp4 / README.md

Hippotes's picture

Update README.md

669dc72 verified 2 months ago

|

history blame contribute delete

1.63 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen-Image-2512
	- Qwen/Qwen2.5-VL-7B-Instruct
	---

	Update 19/02/26: uploaded v2 with some layers kept at BF16 and longer calibration (more steps, adjusted learning rate, etc). Clearly better than the first version I did, doesn't get "fuzzy" on edges or hair.

	--------

	NVFP4 version of [Qwen-Image-2512](https://huggingface.co/Qwen/Qwen-Image-2512) and [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), both created from BF16 versions in [Comfy-Org/Qwen](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI) repo. Made with [Silveroxides/convert_to_quant](https://github.com/silveroxides/convert_to_quant) script.
	Drop-in replacement in Comfy, there's a bit of quality loss as in nunchaku models or others nvfp4.

	Conversion command:
	`convert_to_quant -i qwen_image_2512_bf16.safetensors -o qwen-image2512-nvfp4.safetensors --nvfp4 --comfy_quant --qwen`

	⚠️ I strongly suspect a bottleneck somewhere (memory bandwidth? dequant ops?) that throttles the GPU down in some cases, running small batches instead of single image seems to remove it. I only have surface knowledge of this stuff and don't know how to troubleshoot it.

	\| Batch of 4 \| Qwen-Image-2512-fp8 \| Qwen-Image-2512-nvfp4 \|
	\|------------\|------------------------\|------------------------\|
	\| Vanilla \| 194s (45.6s per image) \| 76s (18.3s per image) \|
	\| Lightning \| 20.4s (5s per image) \| 7.5s (1.75s per image) \|

	(wall time, on a 5090. Vanilla = 50 steps & CFG 4, Lightning = 8 steps & CFG 1 with [Lightning lora](https://huggingface.co/lightx2v/Qwen-Image-2512-Lightning))