HiDream-O1-Image-Dev-2604 · Merged FP16 + ComfyUI Workflows

This is my first time merging a sharded file. I am including the script used to conduct the merge.

A single-file FP16 repack of HiDream-ai's HiDream-O1-Image-Dev-2604, built by merging the upstream's 8 sharded .safetensors files into one ~35 GB checkpoint that drops straight into ComfyUI. Ships with two ready-to-run workflows: a clean text-to-image graph and a reference-image editing graph that wires up all ten of HiDream's reference slots.

Why this exists. The official release is sharded across 8 files, which the stock ComfyUI CheckpointLoaderSimple won't load. The community FP8 builds load fine but trade precision for speed. This repo gives you the highest-precision official Dev weights in one file so the standard loader just works.

What's in this repo

File	Purpose
`HiDream-O1-Image-Dev-2604-FP16.safetensors`	The merged FP16 checkpoint (~35 GB)
`image_hidream_o1_dev_t2i_fp16.json`	Text-to-image ComfyUI workflow
`image_hidream_o1_dev_i2i_fp16.json`	Image-editing ComfyUI workflow with 10 reference slots

About HiDream-O1

HiDream-O1 is an 8B-parameter, pixel-level Unified Transformer that handles text-to-image, image editing, and subject-driven personalization in a single model. It's notable for:

No VAE — pixels and text tokens share one unified representation, removing a common quality bottleneck
Native 2K resolution (trained at 2048×2048 and several other aspect ratios)
Reference-image conditioning — accepts up to 10 reference images for editing or composition
Strong leaderboard performance — ranked #8 on the Artificial Analysis Text-to-Image Arena at time of writing

The "Dev" branch is a distillation of the full HiDream-O1 model. Its native step count is ~28 in text-to-image mode; the included image-edit workflow uses 50 steps with CFG 5.0 for the highest-quality output the distilled weights can produce.

Quickstart (ComfyUI)

1. Download the checkpoint

Drop HiDream-O1-Image-Dev-2604-FP16.safetensors into:

ComfyUI/models/checkpoints/

2. Download the text encoder (required)

HiDream-O1 uses a Gemma-4 text encoder, distributed separately:

File: gemma4_e4b_it_fp8_scaled.safetensors
Location: ComfyUI/models/text_encoders/

ComfyUI/
├── models/
│   ├── checkpoints/
│   │   └── HiDream-O1-Image-Dev-2604-FP16.safetensors
│   └── text_encoders/
│       └── gemma4_e4b_it_fp8_scaled.safetensors

3. Update ComfyUI

HiDream-O1 support is built into ComfyUI core (PR #13817). Make sure you're on a recent build:

cd ComfyUI && git pull

Desktop and Cloud users: update through the app.

4. Load a workflow

Drag either .json into the ComfyUI canvas and hit Queue Prompt.

Workflows

Text-to-Image (`image_hidream_o1_dev_t2i.json`)

A minimal t2i pipeline:

CheckpointLoader → ModelNoiseScale → BasicScheduler ─┐
                                  ↘ SamplerLCM ──────┤
User Prompt → CLIPTextEncode(+) ─────────────────────┤
              CLIPTextEncode(–) ─────────────────────┤→ SamplerCustom → VAEDecode → SaveImage
              EmptyHiDreamO1LatentImage ─────────────┘

Edit the User Prompt node, set width/height on EmptyHiDreamO1LatentImage to one of the model's trained resolutions (see the in-workflow note), and run.

Image Editing (`image_hidream_o1_dev_i2i.json`)

Reference-image editing with all 10 input slots pre-wired. Slot 1 is enabled and points at goose.png; slots 2–10 are bypassed so the wires stay visible but they don't execute.

Configured for quality:

50 sampling steps (BasicScheduler)
CFG 5.0 (SamplerCustom)
Negative prompt pre-filled with common artifact terms
ImageScaleToTotalPixels auto-resizes the reference to ~4 MP (close to the model's native training range)

To add more reference images:

Click any bypassed (grayed-out) LoadImage node
Press Ctrl+B to enable it
Choose an image

Only reference image #1 drives the output dimensions; additional references contribute to conditioning only.

Hardware

Setup	Status
24 GB VRAM (RTX 3090, 4090, A5000)	Recommended — fits comfortably
16 GB VRAM (RTX 4080, A4000)	Workable with ComfyUI's offloading; expect slower runs
12 GB VRAM and below	Not recommended for FP16; use a community FP8 build instead

Roughly 35 GB of system RAM for the initial load is also useful; ComfyUI streams weights to VRAM but the merge format isn't sharded.

Tips

Sticking with the Dev model. The HiDream developers note that the Dev branch produces fewer grid artifacts than the Full model in reference-image work — Dev-2604 is the version you want for editing.
CFG and ModelNoiseScale are different knobs. SamplerCustom's CFG (5.0 here) is classifier-free guidance during sampling. ModelNoiseScale (7.6 default) is HiDream's per-step noise scaling. Both move along the prompt-adherence axis; raise CFG for stronger prompt obedience, lower it if outputs over-saturate.
Step count vs. distillation. The Dev model is distilled for ~28 steps. Going beyond that yields diminishing quality improvements — 50 steps is on the high end of useful. The Full (non-distilled) model is what you'd want for 80–100 step runs.
Trained resolutions. Outputs land best when you target one of: 2048×2048, 2304×1728/1792, 2496×1664, 2560×1440, 3104×1312 (plus the portrait swaps). The in-workflow notes have the full table.

License

This repo inherits the MIT license from the upstream HiDream-O1 release. You can use the weights for personal, research, and commercial purposes. See the original model card for the full text.

Citation

@article{hidreamo1image,
  title   = {HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer},
  author  = {Cai, Qi and Chen, Jingwen and Gao, Chengmin and Gong, Zijian and Li, Yehao and Mei, Tao and Pan, Yingwei and Peng, Yi and Qiu, Zhaofan and Yao, Ting and Yu, Kai and Zhang, Yiheng and others},
  journal = {arXiv preprint arXiv:2605.11061},
  year    = {2026}
}

Credits

Model: HiDream-ai — the team that built and released HiDream-O1
ComfyUI integration: @kijai and the Comfy-Org team — PR #13817
Text encoder: Google DeepMind (Gemma family), repackaged by Comfy-Org
FP16 merge + workflows: this repository

Troubleshooting

Problem	Try
`CheckpointLoaderSimple` can't find the model	Confirm the `.safetensors` is in `ComfyUI/models/checkpoints/` and refresh the dropdown
"Unknown model architecture" / load fails	Update ComfyUI — native HiDream-O1 support landed in PR #13817
Out-of-memory during the first run	Drop to a smaller `megapixels` value on `ImageScaleToTotalPixels`, or try the community FP8 build for tighter VRAM budgets
Reference-image edits look smeary or have grid seams	Confirm you're using the Dev model (not Full), and that LoadImage #1 has a clean, well-lit image — slot 1 drives the output dimensions
Bypassed LoadImage nodes throwing errors	Make sure they're truly bypassed (mode 4, gray outline). Right-click → Bypass if not.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for HodgeMann/HiDream-O1-Image-Dev-2604-FP16-merged

Base model

HiDream-ai/HiDream-O1-Image-Dev-2604

Finetuned

(2)

this model

Paper for HodgeMann/HiDream-O1-Image-Dev-2604-FP16-merged

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

Paper • 2605.11061 • Published May 11 • 2