Disty0 commited on
Commit
75526de
·
verified ·
1 Parent(s): 4d9d38f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: flux-non-commercial-license
4
+ license_link: LICENSE.md
5
+ base_model:
6
+ - black-forest-labs/FLUX.2-klein-4B
7
+ base_model_relation: quantized
8
+ library_name: diffusers
9
+ tags:
10
+ - sdnq
11
+ - flux
12
+ - 4-bit
13
+ ---
14
+ Dynamic 4 bit quantization of [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) using [SDNQ](https://github.com/Disty0/sdnq).
15
+
16
+ This model uses per layer fine grained quantization.
17
+ What dtype to use for a layer is selected dynamically by trial and error until the std normalized mse loss is lower than the selected threshold.
18
+
19
+ Minimum allowed dtype is set to uint4 and std normalized mse loss threshold is set to 1e-2.
20
+ This created a mixed precision model with uint4 and int5 dtypes.
21
+ SVD quantization is disabled.
22
+
23
+ Usage:
24
+ ```
25
+ pip install sdnq
26
+ ```
27
+
28
+ ```py
29
+ import torch
30
+ import diffusers
31
+ from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
32
+ from sdnq.common import use_torch_compile as triton_is_available
33
+ from sdnq.loader import apply_sdnq_options_to_model
34
+
35
+ pipe = diffusers.Flux2KleinPipeline.from_pretrained("Disty0/FLUX.2-klein-4B-SDNQ-4bit-dynamic", torch_dtype=torch.bfloat16)
36
+
37
+ # Enable INT8 MatMul for AMD, Intel ARC and Nvidia GPUs:
38
+ if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
39
+ pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
40
+ pipe.text_encoder = apply_sdnq_options_to_model(pipe.text_encoder, use_quantized_matmul=True)
41
+ # pipe.transformer = torch.compile(pipe.transformer) # optional for faster speeds
42
+
43
+ pipe.enable_model_cpu_offload()
44
+
45
+ prompt = "A cat holding a sign that says hello world"
46
+ image = pipe(
47
+ prompt=prompt,
48
+ height=1024,
49
+ width=1024,
50
+ guidance_scale=1.0,
51
+ num_inference_steps=4,
52
+ generator=torch.manual_seed(0)
53
+ ).images[0]
54
+
55
+ image.save("flux-klein-sdnq-4bit-dynamic-svd-r32.png")
56
+ ```
57
+
58
+ Original BF16 vs SDNQ quantization comparison:
59
+
60
+ | Quantization | Model Size | Visualization |
61
+ | --- | --- | --- |
62
+ | Original BF16 | 7.8 GB | ![Original BF16](https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/Eu8tAf8M-HRMBtPm5mgrw.png) |
63
+ | SDNQ 4 Bit | 2.5 GB | ![SDNQ UINT4](https://cdn-uploads.huggingface.co/production/uploads/6456af6195082f722d178522/RGBdjs--EmSvWFuhovwEe.png) |
64
+