Dream is a new LLM developed by the HKU NLP Group. It uses the diffusion architecture typically used by image generation AI for text. In other words, instead of generation text sequentially, it gradually generates text in the style of "computer, enhance image!".

The Dream v0 Instruct 7B model is small enough to run locally on llama.cpp (4.68 GB in Q4_K_M quantisation).

llama.cpp added diffusion support around July 2025.

Demo with denoising process visualisation: https://huggingface.co/spaces/multimodalart/Dream

The specific CLI switches are only implemented in diffusion-cli, not llama-server which is designed to process autoregressive causal language models with a KV cache.

c.f. https://lab.cloud/blog/text-diffusion-support/