stepfun-ai
/

NextStep-1-Large

text-generation

Model card Files Files and versions

NextStep-1-Large / README.md

Malte0621's picture

Fix typo in README.md

cd39674 verified 8 months ago

|

3.07 kB

	---
	license: apache-2.0
	---

	## NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

	[Homepage](https://stepfun.ai/research/en/nextstep-1)  \| [GitHub](https://github.com/stepfun-ai/NextStep-1)  \| [Paper](https://github.com/stepfun-ai/NextStep-1/blob/main/nextstep_1_tech_report.pdf)

	We introduce NextStep-1, a 14B autoregressive model paired with a 157M flow matching head, training on discrete text tokens and continuous image tokens with next-token prediction objectives.
	NextStep-1 achieves state-of-the-art performance for autoregressive models in text-to-image generation tasks, exhibiting strong capabilities in high-fidelity image synthesis.

	<div align='center'>
	<img src="assets/teaser.jpg" class="interpolation-image" alt="arch." width="100%" />
	</div>

	## ENV Preparation

	To avoid potential errors when loading and running your models, we recommend using the following settings:

	```shell
	conda create -n nextstep python=3.11 -y
	conda activate nextstep

	pip install uv # optional

	# please check and download requirements.txt in this repo
	uv pip install -r requirements.txt

	# diffusers==0.34.0
	# einops==0.8.1
	# gradio==5.42.0
	# loguru==0.7.3
	# numpy==1.26.4
	# omegaconf==2.3.0
	# Pillow==11.0.0
	# Requests==2.32.4
	# safetensors==0.5.3
	# tabulate==0.9.0
	# torch==2.5.1
	# torchvision==0.20.1
	# tqdm==4.67.1
	# transformers==4.55.0
	```

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel
	from models.gen_pipeline import NextStepPipeline

	HF_HUB = "stepfun-ai/NextStep-1-Large"

	# load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True)
	model = AutoModel.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True)
	pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device="cuda", dtype=torch.bfloat16)

	# set prompts
	positive_prompt = "masterpiece, film grained, best quality."
	negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry."
	example_prompt = "A realistic photograph of a wall with \"NextStep-1.1 is coming\" prominently displayed"

	# generate image from text
	IMG_SIZE = 512
	image = pipeline.generate_image(
	example_prompt,
	hw=(IMG_SIZE, IMG_SIZE),
	num_images_per_caption=1,
	positive_prompt=positive_prompt,
	negative_prompt=negative_prompt,
	cfg=7.5,
	cfg_img=1.0,
	cfg_schedule="constant",
	use_norm=False,
	num_sampling_steps=28,
	timesteps_shift=1.0,
	seed=3407,
	)[0]
	image.save("./assets/output.jpg")
	```

	## Citation

	If you find NextStep useful for your research and applications, please consider starring this repository and citing:

	```bibtex
	@misc{nextstep_1,
	title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale},
	author={NextStep Team},
	year={2025},
	url={https://github.com/stepfun-ai/NextStep-1},
	}
	```