syrinenoamen's picture
general distr
3e86c42
|
raw
history blame
5.04 kB
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---
# Model Card for Initial Noise Loader for Stable Diffusion XL
<!-- Provide a quick summary of what the model is/does. -->
This custom pipeline contains an initial noise loader class (class `NoiseLoaderMixin` inspired from LoRA / textual inversion loaders in the diffusers library) for Stable Diffusion XL architecture. The initial noise loader allows to change the distribution of initial noise the generation process starts from with a single line of code “custom_pipeline.load_initial_noise_modifier(…)”.
Currently implemented methods:
- Start generation from a fixed noise.
Example: `custom_pipeline.load_initial_noise_modifier(method="fixed-seed", seed=…)`
- Golden Noise for Diffusion Models: A Learning Framework (Zhou et al., https://arxiv.org/abs/2411.09502).
Example: `custom_pipeline.load_initial_noise_modifier(method="golden-noise", npnet_path=…)`
- General Normal Distribution: Sample from a user defined General Normal Distribution
Example: `custom_pipeline.load_initial_noise_modifier(method="general-normal-distribution", init_noise_mean=(0, -0.1, 0.2, 0), init_noise_std=(1, 1, 1, 1)])`
Demo Notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-owYN8r2TbT-Je_eTEpnIMLj1nvxPYqI#scrollTo=HQS6OQ44jz66)
## Citation
If you find my code useful, you may cite:
```
@misc{initial_noise,
author = {Syrine Noamen},
title = { Initial Noise Loader for Stable Diffusion XL - HuggingFace},
year = 2025,
publisher = { HugginFace },
journal = { Hugging Face repository},
howpublished = {\url{https://huggingface.co/syrinenoamen/stable-diffusion_xl_initial_noise_loader}},
}
```
## Example 1: Start generation from a fixed noise
This example is mostly for demonstration as this can already be achieved easily in the diffusers library.
### Uses
```
from diffusers import DiffusionPipeline
custom_pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16",
torch_dtype=torch.float16,
use_safetensors=True,
custom_pipeline="syrinenoamen/stable-diffusion_xl_initial_noise_loader"
).to(device)
custom_pipeline.load_initial_noise_modifier(method="fixed-seed", seed=12345)
```
![Different seeds](examples/fixed-seed.png)
## Example 2: Golden Noise for Diffusion Models: A Learning Framework (Zhou et al., https://arxiv.org/abs/2411.09502)
## Requirements
```pip install timm einops```
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
```
from diffusers import DiffusionPipeline
custom_pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16",
torch_dtype=torch.float16,
use_safetensors=True,
custom_pipeline="syrinenoamen/initial_noise_loader"
).to(device)
```
![Golden Noise](examples/golden-noise.png)
## Citation Golden Noise
Code adapted from [Github Repo](https://github.com/xie-lab-ml/Golden-Noise-for-Diffusion-Models)
```
@misc{zhou2024goldennoisediffusionmodels,
title={Golden Noise for Diffusion Models: A Learning Framework},
author={Zikai Zhou and Shitong Shao and Lichen Bai and Zhiqiang Xu and Bo Han and Zeke Xie},
year={2024},
eprint={2411.09502},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.09502},
}
```
## Example 3: General Normal Distribution
The latent space of SDXL is a 4-channel tensor with interpretable semantics. Channel 1 primarily encodes luminance or overall brightness, while Channel 2 captures the cyan–red color axis, and Channel 3 represents the green–blue axis. Channel 4 encodes structure and patterns.
By manipulating the mean values of these channels—particularly those associated with color—you can bias the generation process toward specific visual tones or styles. This allows for a degree of control over the image's color palette directly in the latent space, without modifying the text prompt or conditioning vectors.
<div style="display: flex; justify-content: space-between; align-items: center;">
<div style="text-align: center; flex: 1; margin-right: 10px;">
<img src="examples/mountain_blue.png" alt="Blue, purple tone" style="width:100%;">
<p><em>(a) Biased toward blue and purple tones</em></p>
</div>
<div style="text-align: center; flex: 1; margin-left: 10px;">
<img src="examples/mountain_red.png" alt="Red, orange tone" style="width:100%;">
<p><em>(b) Biased toward red and orange tones</em></p>
</div>
</div>
<p style="text-align: center;"><strong>Figure:</strong> Controlling the latent space color distribution biases the generation toward different global color schemes.</p>