Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-NVFP4

Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-NVFP4

Overview

This is a partial NVFP4 quantization of Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v by lightx2v, produced using convert_to_quant by silveroxides.

Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an image-to-video generation model built on Wan2.1-I2V-14B-480P. It applies step distillation and classifier-free guidance distillation to reduce inference to 4 steps without CFG, cutting generation time substantially while preserving output quality.

IMPORTANT

Since NVFP4 is only supported on NVIDIA Blackwell architecture GPUs, running this model requires a Blackwell GPU with its corresponding support enabled in torch, along with a recent version of ComfyUI and comfy-kitchen built against CUDA 13.

➡

Quantization

The model weights have been partially quantized to NVFP4 (NVIDIA Floating Point 4-bit), a quantization format supported on NVIDIA Blackwell architecture GPUs. Out of the 480 layers eligible for quantization, only a subset has been quantized to NVFP4; the remaining eligible layers are quantized to FP8 to preserve output quality.

The quantization format assigned to each layer is based on a sensitivity analysis performed with a custom script, which scores each weight tensor using excess kurtosis, dynamic range, and aspect ratio. Thresholds are derived automatically from the model's own score distribution.

The analysis yields the following convert_to_quant parameters. This conversion takes about 140 minutes on an RTX 5060 resulting in a 9.76 safetensors file.

convert_to_quant -i Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-bf16.safetensors \
  --nvfp4 --wan --comfy_quant --save-quant-metadata \
  --custom-type fp8  \
  --custom-layers "blocks\.(1|2|3)\.cross_attn\.k\.weight|blocks\.(6|8|9|10)\.cross_attn\.k\.weight|blocks\.(0|1|2|3)\.cross_attn\.v\.weight|blocks\.(6)\.cross_attn\.q\.weight|blocks\.(6|14)\.cross_attn\.o\.weight|blocks\.(0|1|2|3)\.cross_attn\.v_img\.weight|blocks\.(0|1|2|3)\.ffn\.0\.weight|blocks\.(36|37|38|39)\.ffn\.0\.weight" \
  --exclude-layers "blocks\.(4|5|7)\.cross_attn\.k\.weight|blocks\.(0)\.cross_attn\.q\.weight|blocks\.(5|7|9|10|11|12|19|20)\.cross_attn\.o\.weight" \
  -o Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-nvfp4.safetensors

The table below details the quantization format applied per layer type across block ranges:

Layer	0–3	4–9	10–15	16–22	23–29	30–35	36–39
self_attn.q	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
self_attn.k	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
self_attn.v	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
self_attn.o	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
cross_attn.q	BF16 (25%) / NVFP4 (75%)	FP8 (17%) / NVFP4 (83%)	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
cross_attn.k	FP8 (75%) / NVFP4 (25%)	BF16 (50%) / FP8 (50%)	FP8 (17%) / NVFP4 (83%)	NVFP4	NVFP4	NVFP4	NVFP4
cross_attn.v	FP8	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
cross_attn.o	NVFP4	BF16 (50%) / FP8 (17%) / NVFP4 (33%)	BF16 (50%) / FP8 (17%) / NVFP4 (33%)	NVFP4	NVFP4	NVFP4	NVFP4
cross_attn.k_img	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
cross_attn.v_img	FP8	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4
ffn.0	FP8	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	FP8
ffn.2	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4	NVFP4

Inference

The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations:

Parameter	Value
Shift	5.0
Sampler	LCM
Scheduler	normal
CFG	1.0
Steps	4

The combinations euler/simple and heun/linear_quadratic (sampler/scheduler) are also known to produce good results.

The model is designed to generate 81 frames and is not compatible with LoRAs. Sampling completes in under 60 seconds on an RTX 5060, making it possible to produce a full 81-frame video in under two minutes; with RIFE, those 81 frames convert to a 10-second video.

Abrupt camera movements or fast subject motion may produce artifacts. This is an inherent limitation of applying aggressive quantization to an already distilled model.

License Agreement

This model is licensed under the Apache 2.0 License. You retain full ownership of your generated content, but are solely responsible for its use in compliance with the license terms and applicable laws.

Acknowledgements

Big kudos to the contributors to the Wan2.1 and Self-Forcing repositories for their open research, and to silveroxides for their quantization tools.

Downloads last month: 75

Model tree for InsecureErasure/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v-NVFP4

Base model

Wan-AI/Wan2.1-I2V-14B-480P

Finetuned

lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v

Quantized

(1)

this model