Prompt Fungineer v2

Takes a simple 2-10 word image description and expands it into a detailed 50-150 word prompt for AI image generators.

Prompt Fungineer v2 is a full fine-tune of Qwen/Qwen3-0.6B (596M parameters) trained on 9,400 Claude-generated prompt pairs. It produces detailed, richly specific prompts suitable for Midjourney, Stable Diffusion, DALL-E, Flux, and other image generation models.

This is the successor to treadon/prompt-fungineer-355M, which was built on GPT-2 Medium in 2023.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "treadon/prompt-fungineer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

def expand_prompt(short_description: str, max_new_tokens: int = 256) -> str:
    messages = [
        {
            "role": "user",
            "content": f"Expand this into a detailed image generation prompt: {short_description}"
        }
    ]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
        )

    generated = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return generated.strip()

# Example
import torch
print(expand_prompt("cherry blossoms in rain"))

Example output:

A cherry blossom tree in full bloom photographed at the moment of rain, the petals floating and falling in the downpour. Fuji X-T5, 24mm f/4 ultra-wide lens, f/8, ISO 800, 1/1000s. The petals are translucent pink and white, each one catching the raindrop light differently. The rain falls in the background as a soft white blur. Color: vivid pink blossom against white rain, the sky behind in soft grey. Matte finish. The delicate pink against the wet grey sky. Japanese seasonal beauty.

Model Details

Property Value
Base model Qwen/Qwen3-0.6B (2025 architecture)
Parameters 596M
Training method Full fine-tune (all parameters unfrozen)
Training data 9,400 Claude-generated prompt pairs
Categories 11 (landscapes, portraits, sci-fi, abstract, architecture, animals, historical, food, macro, emotions, vehicles)
Style distribution 5 styles (photographic 52%, cinematic 14%, painterly 14%, digital art 10%, illustration 9%)
Epochs 4
Learning rate 2e-5, cosine schedule with 6% warmup
Effective batch size 32 (batch 1 x gradient accumulation 32)
Precision float32
Gradient checkpointing Enabled
Max sequence length 384 tokens
Training time ~2h 50m on Apple M4 Pro (MPS backend)
Final train loss 1.51
Final eval loss 2.67
Optimizer AdamW (weight decay 0.01)
License Apache 2.0

v1 vs v2 Comparison

Prompt Fungineer v2 generates longer, more specific, and more technically detailed prompts compared to v1.

Metric v1 (GPT-2 355M) v2 (Qwen3-0.6B)
Parameters 355M 596M (1.7x)
Architecture GPT-2 (2019) Qwen3 (2025)
Avg output length 92 words 117 words (+27%)
Avg generation time 2.6s 4.8s
Training data Web-scraped prompts 9,400 Claude-generated pairs

Side-by-Side Examples

Input v1 (GPT-2 355M) v2 (Qwen3-0.6B)
"sunset over the ocean" Generic: "A breathtaking image of a tranquil, calm image of the horizon at sunset, revealing a serene, crystal-clear ocean..." Specific: "The Pacific Ocean at sunset, the sea surface in perfect mirror -- the sky above reflected in the dark water below, the sky and its reflection creating a continuous white and gold spectacle from horizon to zenith."
"neon-lit street at night" Generic: "A captivating image of a bustling, neon-lit street at night, with pedestrians weaving through..." Specific: "A city street at 2am, every surface lit by a different coloured neon sign -- red traffic lights, blue bus stop signs, green bus number boards, yellow taxi stop boards."
"cherry blossoms in rain" Wrong scene: confused with "a lush, tropical rainforest" Correct scene: cherry blossom tree in rain, named specific camera (Fuji X-T5, 24mm f/4), technical settings (f/8, ISO 800, 1/1000s)
"wolf howling at the moon" Hallucinated: "A haunting image of a deer stalking the night sky" Correct subject: "A grey wolf stands in the moonlight, its amber eyes locked on the full moon. The moon is a perfect silver disc above the horizon."
"frozen waterfall" Vague: "An enchanting image of a majestic waterfall cascading from a cliff" (missed the "frozen" part) Accurate: "A waterfall frozen at its peak, the water solid and translucent in the cold air, the waterfall walls visible through the ice."
"grandmother baking bread" Odd details: "leaving a trail of pink and orange bread crumbs on the cob" Grounded: "A grandmother in her 80s holds a large wooden mixing board covered in a sheet of sourdough, steam rising from the mixture. Her hands are flour-dusted."

Key Improvements in v2

  • Specificity: v2 names concrete details (locations, camera models, times of day) instead of generic adjectives
  • Technical vocabulary: v2 includes camera specs (f-stops, ISO, lens mm), composition directions, and lighting terms
  • Subject accuracy: v2 stays true to the input description; v1 sometimes drifted to unrelated subjects
  • Scene coherence: v2 builds a unified scene with a consistent mood rather than stacking unrelated descriptors

Known Limitations

  • Chat template artifacts: v2 occasionally includes fragments of the chat template (e.g., user, assistant markers) in the generated output. Post-processing to strip these is recommended.
  • Hallucinated details: v2 can introduce incorrect details. For example, "old man playing chess" produced a prompt mentioning a "mahjong table" despite correctly describing chess pieces and a chessboard.
  • Slower generation: v2 averages 4.8s per prompt vs 2.6s for v1, due to the larger model and longer outputs.
  • Repetitive phrasing: v2 sometimes falls into repetitive philosophical loops at the end of longer outputs (e.g., "The moon that is watching and the animal that is watching the moon. The moon that is always there.").
  • Overfitting gap: The gap between train loss (1.51) and eval loss (2.67) suggests some overfitting, which may contribute to occasional repetition or formulaic outputs.

Training Data

The model was trained on treadon/prompt-fungineer-v2-training-data, a dataset of 9,400 prompt pairs generated by Claude (Sonnet/Opus).

Category distribution:

Category Count
landscapes_nature 1,288
portraits_people 1,091
architecture_urban 979
historical_vintage 940
emotions_concepts 812
abstract_artistic 774
food_stilllife 750
macro_closeup 742
animals_wildlife 701
scifi_fantasy 681
vehicles_technology 642

Each pair consists of a short input (1-9 words, avg 4.8) and a detailed output (53-181 words, avg 90.5). The data spans 5 style profiles: photographic, cinematic, painterly, digital art, and illustration.

Quality controls applied during generation:

  • No copyrighted names or characters
  • No NSFW content
  • Varied sentence structures
  • Prompt lengths between 50-150 words
  • Each pair validated: input at most 15 words, output at least 20 words

Training Infrastructure

Trained on an Apple M4 Pro with 64GB unified memory using the MPS backend. Key lessons from training on Apple Silicon:

  • Use HuggingFace Trainer, not raw PyTorch loops (MPS shader compilation can hang with manual device placement)
  • Use float32 (BF16 training on MPS causes silent hangs or incorrect results)
  • Use batch_size=1 + gradient accumulation (float32 activations for 28-layer models blow up memory at larger batch sizes)
  • Enable gradient checkpointing (trades ~30% more compute for ~70% less activation memory)

Citation

@misc{prompt-fungineer-v2,
  author = {Ritesh Khanna},
  title = {Prompt Fungineer v2: Distilling Prompt Engineering Into a Small Local Model},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/treadon/prompt-fungineer-v2}
}

Predecessor

Downloads last month
21
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for treadon/prompt-fungineer-v2

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(795)
this model

Dataset used to train treadon/prompt-fungineer-v2

Space using treadon/prompt-fungineer-v2 1