Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

FLUX.2-klein-base-9B GenEval2 Single-Reward (Flow-DPPO)

A LoRA adapter for black-forest-labs/FLUX.2-klein-base-9B, fine-tuned with Flow-DPPO on GenEval2 in the single-reward setting (optimizing the GenEval2 reward only).

Flow-DPPO

Flow-DPPO (Flow Divergence Proximal Policy Optimization) is an online reinforcement learning method for flow-matching image/video generators. Methods such as Flow-GRPO and Flow-CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clipping to enforce a trust region. Flow-DPPO argues that ratio clipping is a noisy, single-sample proxy for the true policy divergence, which over-constrains some parts of the trajectory and under-constrains others.

Because the per-step policy of a flow model is Gaussian, the KL divergence between the old and new policies can be computed exactly and cheaply. Flow-DPPO replaces ratio clipping with a divergence-proximal constraint, implemented as an asymmetric divergence mask: a gradient update is blocked only when (1) the advantage and ratio indicate the update is moving the policy away from the old policy, and (2) the exact KL already exceeds a threshold. Updates that move back toward the old policy are never blocked, accelerating recovery from overshooting.

This yields higher reward, better KL-proximal efficiency, stronger robustness to catastrophic forgetting, balanced multi-objective optimization, and stable multi-epoch training.

Paper: https://huggingface.co/papers/2606.11025
Code: https://github.com/Tencent-Hunyuan/UniRL/tree/main/FlowDPPO
Trained with the Flow-Factory RL framework.

Usage

black-forest-labs/FLUX.2-klein-base-9B is a gated model. Make sure you have accepted its license and are logged in (hf auth login). FLUX.2 currently requires diffusers from source: pip install git+https://github.com/huggingface/diffusers.git.

import torch
from diffusers import Flux2KleinPipeline
from peft import PeftModel

pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-base-9B",
    torch_dtype=torch.bfloat16,
)

# Load the Flow-DPPO LoRA adapter
pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    "Tencent-Hunyuan-Multimodal-RL/FLUX2-klein-base-9b-GenEval2-Single-Reward",
    torch_dtype=torch.bfloat16,
)

pipe.enable_model_cpu_offload()  # remove and call pipe.to("cuda") if you have enough VRAM

prompt = "four white cats are behind a red bagel"
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
).images[0]
image.save("output.png")

Citation

@article{ping2026flowdppo,
  title={Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models},
  author={Ping, Bowen and Zhou, Xiangxin and Qi, Penghui and Luo, Minnan and Bo, Liefeng and Pang, Tianyu},
  journal={arXiv preprint arXiv:2606.11025},
  year={2026}
}

Framework versions

PEFT 0.19.1

Downloads last month: 37

Model tree for Tencent-Hunyuan-Multimodal-RL/FLUX2-klein-base-9b-GenEval2-Single-Reward

Base model

black-forest-labs/FLUX.2-klein-base-9B

Adapter

(64)

this model

Collection including Tencent-Hunyuan-Multimodal-RL/FLUX2-klein-base-9b-GenEval2-Single-Reward

Flow-DPPO: GenEval2

Collection

Flow-DPPO-trained LoRA adapters (single- and multi-reward) for SD3.5 and FLUX.2-klein-9B optimized on GenEval2. • 5 items • Updated 3 days ago

Paper for Tencent-Hunyuan-Multimodal-RL/FLUX2-klein-base-9b-GenEval2-Single-Reward

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Paper • 2606.11025 • Published 5 days ago • 40