HiDream-O1-Image-Dev-2604 Β· Merged FP16 + ComfyUI Workflows
This is my first time merging a sharded file. I am including the script used to conduct the merge.
A single-file FP16 repack of HiDream-ai's HiDream-O1-Image-Dev-2604, built by merging the upstream's 8 sharded .safetensors files into one ~35 GB checkpoint that drops straight into ComfyUI. Ships with two ready-to-run workflows: a clean text-to-image graph and a reference-image editing graph that wires up all ten of HiDream's reference slots.
Why this exists. The official release is sharded across 8 files, which the stock ComfyUI
CheckpointLoaderSimplewon't load. The community FP8 builds load fine but trade precision for speed. This repo gives you the highest-precision official Dev weights in one file so the standard loader just works.
What's in this repo
| File | Purpose |
|---|---|
HiDream-O1-Image-Dev-2604-FP16.safetensors |
The merged FP16 checkpoint (~35 GB) |
image_hidream_o1_dev_t2i_fp16.json |
Text-to-image ComfyUI workflow |
image_hidream_o1_dev_i2i_fp16.json |
Image-editing ComfyUI workflow with 10 reference slots |
About HiDream-O1
HiDream-O1 is an 8B-parameter, pixel-level Unified Transformer that handles text-to-image, image editing, and subject-driven personalization in a single model. It's notable for:
- No VAE β pixels and text tokens share one unified representation, removing a common quality bottleneck
- Native 2K resolution (trained at 2048Γ2048 and several other aspect ratios)
- Reference-image conditioning β accepts up to 10 reference images for editing or composition
- Strong leaderboard performance β ranked #8 on the Artificial Analysis Text-to-Image Arena at time of writing
The "Dev" branch is a distillation of the full HiDream-O1 model. Its native step count is ~28 in text-to-image mode; the included image-edit workflow uses 50 steps with CFG 5.0 for the highest-quality output the distilled weights can produce.
Quickstart (ComfyUI)
1. Download the checkpoint
Drop HiDream-O1-Image-Dev-2604-FP16.safetensors into:
ComfyUI/models/checkpoints/
2. Download the text encoder (required)
HiDream-O1 uses a Gemma-4 text encoder, distributed separately:
- File:
gemma4_e4b_it_fp8_scaled.safetensors - Location:
ComfyUI/models/text_encoders/
ComfyUI/
βββ models/
β βββ checkpoints/
β β βββ HiDream-O1-Image-Dev-2604-FP16.safetensors
β βββ text_encoders/
β βββ gemma4_e4b_it_fp8_scaled.safetensors
3. Update ComfyUI
HiDream-O1 support is built into ComfyUI core (PR #13817). Make sure you're on a recent build:
cd ComfyUI && git pull
Desktop and Cloud users: update through the app.
4. Load a workflow
Drag either .json into the ComfyUI canvas and hit Queue Prompt.
Workflows
Text-to-Image (image_hidream_o1_dev_t2i.json)
A minimal t2i pipeline:
CheckpointLoader β ModelNoiseScale β BasicScheduler ββ
β SamplerLCM βββββββ€
User Prompt β CLIPTextEncode(+) ββββββββββββββββββββββ€
CLIPTextEncode(β) ββββββββββββββββββββββ€β SamplerCustom β VAEDecode β SaveImage
EmptyHiDreamO1LatentImage ββββββββββββββ
Edit the User Prompt node, set width/height on EmptyHiDreamO1LatentImage to one of the model's trained resolutions (see the in-workflow note), and run.
Image Editing (image_hidream_o1_dev_i2i.json)
Reference-image editing with all 10 input slots pre-wired. Slot 1 is enabled and points at goose.png; slots 2β10 are bypassed so the wires stay visible but they don't execute.
Configured for quality:
- 50 sampling steps (BasicScheduler)
- CFG 5.0 (SamplerCustom)
- Negative prompt pre-filled with common artifact terms
ImageScaleToTotalPixelsauto-resizes the reference to ~4 MP (close to the model's native training range)
To add more reference images:
- Click any bypassed (grayed-out) LoadImage node
- Press Ctrl+B to enable it
- Choose an image
Only reference image #1 drives the output dimensions; additional references contribute to conditioning only.
Hardware
| Setup | Status |
|---|---|
| 24 GB VRAM (RTX 3090, 4090, A5000) | Recommended β fits comfortably |
| 16 GB VRAM (RTX 4080, A4000) | Workable with ComfyUI's offloading; expect slower runs |
| 12 GB VRAM and below | Not recommended for FP16; use a community FP8 build instead |
Roughly 35 GB of system RAM for the initial load is also useful; ComfyUI streams weights to VRAM but the merge format isn't sharded.
Tips
- Sticking with the Dev model. The HiDream developers note that the Dev branch produces fewer grid artifacts than the Full model in reference-image work β Dev-2604 is the version you want for editing.
- CFG and ModelNoiseScale are different knobs.
SamplerCustom's CFG (5.0 here) is classifier-free guidance during sampling.ModelNoiseScale(7.6 default) is HiDream's per-step noise scaling. Both move along the prompt-adherence axis; raise CFG for stronger prompt obedience, lower it if outputs over-saturate. - Step count vs. distillation. The Dev model is distilled for ~28 steps. Going beyond that yields diminishing quality improvements β 50 steps is on the high end of useful. The Full (non-distilled) model is what you'd want for 80β100 step runs.
- Trained resolutions. Outputs land best when you target one of: 2048Γ2048, 2304Γ1728/1792, 2496Γ1664, 2560Γ1440, 3104Γ1312 (plus the portrait swaps). The in-workflow notes have the full table.
License
This repo inherits the MIT license from the upstream HiDream-O1 release. You can use the weights for personal, research, and commercial purposes. See the original model card for the full text.
Citation
@article{hidreamo1image,
title = {HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer},
author = {Cai, Qi and Chen, Jingwen and Gao, Chengmin and Gong, Zijian and Li, Yehao and Mei, Tao and Pan, Yingwei and Peng, Yi and Qiu, Zhaofan and Yao, Ting and Yu, Kai and Zhang, Yiheng and others},
journal = {arXiv preprint arXiv:2605.11061},
year = {2026}
}
Credits
- Model: HiDream-ai β the team that built and released HiDream-O1
- ComfyUI integration: @kijai and the Comfy-Org team β PR #13817
- Text encoder: Google DeepMind (Gemma family), repackaged by Comfy-Org
- FP16 merge + workflows: this repository
Troubleshooting
| Problem | Try |
|---|---|
CheckpointLoaderSimple can't find the model |
Confirm the .safetensors is in ComfyUI/models/checkpoints/ and refresh the dropdown |
| "Unknown model architecture" / load fails | Update ComfyUI β native HiDream-O1 support landed in PR #13817 |
| Out-of-memory during the first run | Drop to a smaller megapixels value on ImageScaleToTotalPixels, or try the community FP8 build for tighter VRAM budgets |
| Reference-image edits look smeary or have grid seams | Confirm you're using the Dev model (not Full), and that LoadImage #1 has a clean, well-lit image β slot 1 drives the output dimensions |
| Bypassed LoadImage nodes throwing errors | Make sure they're truly bypassed (mode 4, gray outline). Right-click β Bypass if not. |
Model tree for HodgeMann/HiDream-O1-Image-Dev-2604-FP16-merged
Base model
HiDream-ai/HiDream-O1-Image-Dev-2604