general distr

3e86c42 11 months ago

5.04 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	{}
	---

	# Model Card for Initial Noise Loader for Stable Diffusion XL

	<!-- Provide a quick summary of what the model is/does. -->
	This custom pipeline contains an initial noise loader class (class `NoiseLoaderMixin` inspired from LoRA / textual inversion loaders in the diffusers library) for Stable Diffusion XL architecture. The initial noise loader allows to change the distribution of initial noise the generation process starts from with a single line of code “custom_pipeline.load_initial_noise_modifier(…)”.
	Currently implemented methods:

	- Start generation from a fixed noise.
	Example: `custom_pipeline.load_initial_noise_modifier(method="fixed-seed", seed=…)`
	- Golden Noise for Diffusion Models: A Learning Framework (Zhou et al., https://arxiv.org/abs/2411.09502).
	Example: `custom_pipeline.load_initial_noise_modifier(method="golden-noise", npnet_path=…)`
	- General Normal Distribution: Sample from a user defined General Normal Distribution
	Example: `custom_pipeline.load_initial_noise_modifier(method="general-normal-distribution", init_noise_mean=(0, -0.1, 0.2, 0), init_noise_std=(1, 1, 1, 1)])`



	Demo Notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-owYN8r2TbT-Je_eTEpnIMLj1nvxPYqI#scrollTo=HQS6OQ44jz66)

	## Citation

	If you find my code useful, you may cite:
	```
	@misc{initial_noise,
	author = {Syrine Noamen},
	title = { Initial Noise Loader for Stable Diffusion XL - HuggingFace},

	year = 2025,
	publisher = { HugginFace },
	journal = { Hugging Face repository},
	howpublished = {\url{https://huggingface.co/syrinenoamen/stable-diffusion_xl_initial_noise_loader}},
	}
	```
	## Example 1: Start generation from a fixed noise

	This example is mostly for demonstration as this can already be achieved easily in the diffusers library.

	### Uses

	```
	from diffusers import DiffusionPipeline
	custom_pipeline = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	variant="fp16",
	torch_dtype=torch.float16,
	use_safetensors=True,
	custom_pipeline="syrinenoamen/stable-diffusion_xl_initial_noise_loader"
	).to(device)
	custom_pipeline.load_initial_noise_modifier(method="fixed-seed", seed=12345)
	```

	![Different seeds](examples/fixed-seed.png)
	## Example 2: Golden Noise for Diffusion Models: A Learning Framework (Zhou et al., https://arxiv.org/abs/2411.09502)

	## Requirements


	```pip install timm einops```

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	```
	from diffusers import DiffusionPipeline
	custom_pipeline = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	variant="fp16",
	torch_dtype=torch.float16,
	use_safetensors=True,
	custom_pipeline="syrinenoamen/initial_noise_loader"
	).to(device)
	```

	![Golden Noise](examples/golden-noise.png)

	## Citation Golden Noise
	Code adapted from [Github Repo](https://github.com/xie-lab-ml/Golden-Noise-for-Diffusion-Models)

	```
	@misc{zhou2024goldennoisediffusionmodels,
	title={Golden Noise for Diffusion Models: A Learning Framework},
	author={Zikai Zhou and Shitong Shao and Lichen Bai and Zhiqiang Xu and Bo Han and Zeke Xie},
	year={2024},
	eprint={2411.09502},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2411.09502},
	}
	```

	## Example 3: General Normal Distribution

	The latent space of SDXL is a 4-channel tensor with interpretable semantics. Channel 1 primarily encodes luminance or overall brightness, while Channel 2 captures the cyan–red color axis, and Channel 3 represents the green–blue axis. Channel 4 encodes structure and patterns.

	By manipulating the mean values of these channels—particularly those associated with color—you can bias the generation process toward specific visual tones or styles. This allows for a degree of control over the image's color palette directly in the latent space, without modifying the text prompt or conditioning vectors.
	<div style="display: flex; justify-content: space-between; align-items: center;">
	<div style="text-align: center; flex: 1; margin-right: 10px;">
	<img src="examples/mountain_blue.png" alt="Blue, purple tone" style="width:100%;">
	<p><em>(a) Biased toward blue and purple tones</em></p>
	</div>
	<div style="text-align: center; flex: 1; margin-left: 10px;">
	<img src="examples/mountain_red.png" alt="Red, orange tone" style="width:100%;">
	<p><em>(b) Biased toward red and orange tones</em></p>
	</div>
	</div>

	<p style="text-align: center;"><strong>Figure:</strong> Controlling the latent space color distribution biases the generation toward different global color schemes.</p>