Disty0
/

FLUX.2-klein-4B-SDNQ-4bit-dynamic

Flux2KleinPipeline

4-bit precision

Model card Files Files and versions

Disty0 commited on Jan 15

Commit

75526de

·

verified ·

1 Parent(s): 4d9d38f

Create README.md

Files changed (1) hide show

README.md +64 -0

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+---
+license: other
+license_name: flux-non-commercial-license
+license_link: LICENSE.md
+base_model:
+- black-forest-labs/FLUX.2-klein-4B
+base_model_relation: quantized
+library_name: diffusers
+tags:
+- sdnq
+- flux
+- 4-bit
+---
+Dynamic 4 bit quantization of [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) using [SDNQ](https://github.com/Disty0/sdnq).
+This model uses per layer fine grained quantization.
+What dtype to use for a layer is selected dynamically by trial and error until the std normalized mse loss is lower than the selected threshold.
+Minimum allowed dtype is set to uint4 and std normalized mse loss threshold is set to 1e-2.
+This created a mixed precision model with uint4 and int5 dtypes.
+SVD quantization is disabled.
+Usage:
+```
+pip install sdnq
+```
+```py
+import torch
+import diffusers
+from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
+from sdnq.common import use_torch_compile as triton_is_available
+from sdnq.loader import apply_sdnq_options_to_model
+pipe = diffusers.Flux2KleinPipeline.from_pretrained("Disty0/FLUX.2-klein-4B-SDNQ-4bit-dynamic", torch_dtype=torch.bfloat16)
+# Enable INT8 MatMul for AMD, Intel ARC and Nvidia GPUs:
+if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
+    pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
+    pipe.text_encoder = apply_sdnq_options_to_model(pipe.text_encoder, use_quantized_matmul=True)
+    # pipe.transformer = torch.compile(pipe.transformer) # optional for faster speeds
+pipe.enable_model_cpu_offload()
+prompt = "A cat holding a sign that says hello world"
+image = pipe(
+    prompt=prompt,
+    height=1024,
+    width=1024,
+    guidance_scale=1.0,
+    num_inference_steps=4,
+    generator=torch.manual_seed(0)
+).images[0]
+image.save("flux-klein-sdnq-4bit-dynamic-svd-r32.png")
+```
+Original BF16 vs SDNQ quantization comparison:
+| Quantization | Model Size | Visualization |
+| --- | --- | --- |
+| Original BF16 | 7.8 GB | ![Original BF16](https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/Eu8tAf8M-HRMBtPm5mgrw.png) |
+| SDNQ 4 Bit | 2.5 GB | ![SDNQ UINT4](https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/RGBdjs--EmSvWFuhovwEe.png) |