File size: 9,933 Bytes
3b8bbf2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | ---
license: other
license_name: fair-ai-public-license-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
base_model:
- CabalResearch/NoobAI-Flux2VAE-RectifiedFlow
library_name: diffusers
---
## Model Details
A continuation of the [NoobAI Flux2 VAE experiment](https://huggingface.co/CabalResearch/NoobAI-Flux2VAE-RectifiedFlow)
More info on supporting us: [click me](https://huggingface.co/CabalResearch/NoobAI-Flux2VAE-RectifiedFlow-0.3#potential-future)
### Model Description
Resumed for 4 more epochs, model has shown a nice improvement. We observe good convergence to new details, that were hard to achieve on prior arch. Compositions and stability are strongly improved relative to Epoch 2, as well as downstream trainability (like LoRAs).
Current state is usable for normal generations, so we encourage you to try it. We will provide an [easy node for ComfyUI](https://github.com/Anzhc/SDXL-Flux2VAE-ComfyUI-Node), as well as basic workflow. If you are an A1111 user, please use [ReForge](https://github.com/Panchovix/stable-diffusion-webui-reForge), it has native support, instructions will be below.

Once again, we are working with limited compute, but are quite happy with the result so far, and hope to continue working on the model.
- **Developed by:** Cabal Research (Bluvoll, Anzhc)
- **Funded by:** Community
- **License:** [fair-ai-public-license-1.0-sd](https://freedevproject.org/faipl-1.0-sd/)
- **Finetuned from model:** [NoobAI Flux2 VAE experiment](https://huggingface.co/CabalResearch/NoobAI-Flux2VAE-RectifiedFlow)
## Bias and Limitations
While we are seeing new level of details, it is still early to call it a day. Complex intersections, extremely small details, abstract and wide shots will pose a significant challenge to model and result in noise-like patterns, but we see steady progression in resolving that noise through epochs.
Most biases of official dataset will apply(Blue Archive, etc.).
We are yet to get to steady composition and anatomy, but good LoRAs help drastically with this at current stage.
## Model Output Examples







P.S. We are pretty bad at generating images, on Epoch 2 we've seen quite a few examples of much better generations that what we've shown, wonder if this time it will also be the case.
# Recommendations
### Inference
#### Comfy

(Workflow is available alongside model in repo)
We will provide a Node, and hope it will be adapted natively in main repo eventually:
**https://github.com/Anzhc/SDXL-Flux2VAE-ComfyUI-Node**
Same as your normal inference, but with addition of SD3 sampling node, as this model is Flow-based.
Recommended Parameters:
**Sampler**: Euler, Euler A, DPM++ SDE, etc.
**Steps**: 20-28
**CFG**: 6-9
**Shift**: 3-12
**Schedule**: Normal/Simple/SGM Uniform/Quadratic
**Positive Quality Tags**: `masterpiece, best quality`
**Negative Tags**: `worst quality, normal quality, bad anatomy`
#### A1111 WebUI
(All screenshots are repeating our RF release, as there is no difference in setup)
Recommended WebUI: [ReForge](https://github.com/Panchovix/stable-diffusion-webui-reForge) - has native support for Flow models, and we've PR'd our native support for Flux2vae-based SDXL modification.
**How to use in ReForge**:

(ignore Sigma max field at the top, this is not used in RF)
Support for RF in ReForge is being implemented through a built-in extension:


Set parameters to that, and you're good to go.
Flux2VAE does not currently have an appropriate high quality preview method, please use Approx Cheap option, which would allow you to see simple PCA projection(ReForge).
Recommended Parameters:
**Sampler**: Euler A Comfy RF, Euler A2, Euler, DPM++ SDE Comfy, etc. **ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.**
**Steps**: 20-28
**CFG**: 6-9
**Shift**: 3-12
**Schedule**: Normal/Simple/SGM Uniform
**Positive Quality Tags**: `masterpiece, best quality`
**Negative Tags**: `worst quality, normal quality, bad anatomy`
**ADETAILER FIX FOR RF**:
By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:

Add: `advanced_model_sampling_script,advanced_model_sampling_script_backported` to there.
If that does not work, go into adetailer extension, find args.py, open it, replace _builtin_scripts like this:

Here is a copypaste for easy copy:
```
_builtin_script = (
"advanced_model_sampling_script",
"advanced_model_sampling_script_backported",
"hypertile_script",
"soft_inpainting",
)
```
Or use my fork of Adetailer - https://github.com/Anzhc/aadetailer-reforge
## Training
### Model Composition
(Relative to base it's trained from)
Unet: Same
CLIP L: Same, Frozen
CLIP G: Same, Frozen
VAE: [Flux2 VAE](https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/main/vae)
### Training Details
(Main Stage Training)
**Samples seen**(unbatched steps): ~50 million samples seen
**Learning Rate**: 6-5e-5 (mixed)
**Effective Batch size**: ~1400
**Precision**: Mixed BF16
**Optimizer**: AdamW8bit with Kahan Summation
**Weight Decay**: 0.01
**Schedule**: Constant with warmup
**Timestep Sampling Strategy**: Logit-Normal -0.2 1.5 (sometimes referred to as Lognorm), Shift 2.5
**Text Encoders**: Frozen
**Keep Token**: False
**Tag Dropout**: 10%
**Uncond Dropout**: 10%
**Shuffle**: True
**VAE Conv Padding**: False
**VAE Shift**: 0.0760
**VAE Scale**: 0.6043
**Additional Features used**: Protected Tags, Cosine Optimal Transport.
#### Training Data
6 epochs of the original NoobAI dataset, including images up to October 2024, minus screencap data(was not shared).
### LoRA Training
Current state of the model provides adequate trainability, but expect the need to train for a bit longer, as we are still undertrained.
My current style training settings (Anzhc):
**Learning Rate**: tested up to **7.5e-4**
**Batch Size**: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.
**Optimizer**: Adamw8bit with Kahan summation
**Schedule**: ReREX (Use REX for simplicity, or Cosine annealing)
**Precision**: Full BF16
**Weight Decay**: 0.02
**Timestep Sampling Strategy**: Logit-Normal(either 0.0 1.0, or -0.2 1.5), Shift 2.5-4.5
**Dim/Alpha/Conv/Alpha**: 24/24/24/24 (Lycoris/Locon)
**Text Encoders**: Frozen
**Optimal Transport**: True
**Expected Dataset Size**: 100-200 images (Can be even 10, but balance with repeats to roughly this target.)
**Epochs**: 50
Concepts seem to train at similar speed to prior NoobAI models, but have not tested explicitly.
### Hardware
Model was trained on cloud 8xH200 node.
### Software
Custom fork of [SD-Scripts](https://github.com/bluvoll/sd-scripts)(maintained by Bluvoll)
## Acknowledgements
### Special Thanks
**To a special supporter who singlehandidly sponsored whole run and preferred to stay anonymous**
---
# Support
If you wish to support our continuous effort of making waifus 0.2% better, you can do it here:
### **https://ko-fi.com/bluvoll** (Blu, donate here to support training)
https://ko-fi.com/anzhc (Anzhc, non-training, just survival)

Crypto link pending.
# Potential future
**Expected Compute Needed**: We still consider full run to be in range of 20+ epochs, but no longer think that it is the bare minimum for stable model, as progress with just current 6 epochs has been quite drastic in that regard. 10 epochs are likely a good marker for that.
**Dataset**: We would love to start processing of the booru data with our in-house classification models to fix some of the glaring issues with the default Danbooru dataset, as well as thorough processing to some of the concepts, but we as of now we don't have budget to rent a dedicated server for persistent storage.
**Future Training**: We have confirmation from Sponsor that we would continue training of the model beyond Epoch 6, but it will resume after a short break. |