zyf515730395 's Collections Image Generation
updated
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper
• 2506.07977
• Published • 41
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper
• 2506.07986
• Published • 19
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image
Synthesis
Paper
• 2506.06276
• Published • 26
Aligning Latent Spaces with Flow Priors
Paper
• 2506.05240
• Published • 27
Image Editing As Programs with Diffusion Models
Paper
• 2506.04158
• Published • 24
D-AR: Diffusion via Autoregressive Models
Paper
• 2505.23660
• Published • 34
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with
Rectified Flow Transformers
Paper
• 2505.23758
• Published • 22
OmniConsistency: Learning Style-Agnostic Consistency from Paired
Stylization Data
Paper
• 2505.18445
• Published • 63
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published • 77
Step1X-Edit: A Practical Framework for General Image Editing
Paper
• 2504.17761
• Published • 92
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via
Triplet ID Group Learning
Paper
• 2504.14509
• Published • 53
VisualCloze: A Universal Image Generation Framework via Visual
In-Context Learning
Paper
• 2504.07960
• Published • 50
Less-to-More Generalization: Unlocking More Controllability by
In-Context Generation
Paper
• 2504.02160
• Published • 37
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
• 2503.09573
• Published • 77
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference
Time by Leveraging Sparsity
Paper
• 2503.07677
• Published • 86
Seedream 2.0: A Native Chinese-English Bilingual Image Generation
Foundation Model
Paper
• 2503.07703
• Published • 37
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Paper
• 2503.16418
• Published • 36
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
• 2505.05470
• Published • 88
In-Context Edit: Enabling Instructional Image Editing with In-Context
Generation in Large Scale Diffusion Transformer
Paper
• 2504.20690
• Published • 19
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
• 2504.17789
• Published • 23
Seedream 3.0 Technical Report
Paper
• 2504.11346
• Published • 70
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published • 46
PixelFlow: Pixel-Space Generative Models with Flow
Paper
• 2504.07963
• Published • 18
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
• 2504.11455
• Published • 14
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published • 11
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper
• 2411.15098
• Published • 61
Flow Matching for Generative Modeling
Paper
• 2210.02747
• Published • 4
Flow Straight and Fast: Learning to Generate and Transfer Data with
Rectified Flow
Paper
• 2209.03003
• Published • 3
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper
• 2403.03206
• Published • 71
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper
• 2506.14603
• Published • 19
OmniGen2: Exploration to Advanced Multimodal Generation
Paper
• 2506.18871
• Published • 78
Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low
CFG Scales
Paper
• 2506.19713
• Published • 13
XVerse: Consistent Multi-Subject Control of Identity and Semantic
Attributes via DiT Modulation
Paper
• 2506.21416
• Published • 28
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper
• 2507.05566
• Published • 116
Vision Foundation Models as Effective Visual Tokenizers for
Autoregressive Image Generation
Paper
• 2507.08441
• Published • 62
Qwen-Image Technical Report
Paper
• 2508.02324
• Published • 274
NextStep-1: Toward Autoregressive Image Generation with Continuous
Tokens at Scale
Paper
• 2508.10711
• Published • 146
Omni-Effects: Unified and Spatially-Controllable Visual Effects
Generation
Paper
• 2508.07981
• Published • 63
Reinforcement Learning in Vision: A Survey
Paper
• 2508.08189
• Published • 30
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided
Region Control
Paper
• 2508.08134
• Published • 10
Next Visual Granularity Generation
Paper
• 2508.12811
• Published • 49
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of
Diffusion Models
Paper
• 2508.12880
• Published • 48
MultiRef: Controllable Image Generation with Multiple Visual References
Paper
• 2508.06905
• Published • 21
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion
Transformer
Paper
• 2508.09131
• Published • 17
OmniTry: Virtual Try-On Anything without Masks
Paper
• 2508.13632
• Published • 15
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed
Inference
Paper
• 2508.02193
• Published • 138
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper
• 2509.20427
• Published • 84
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Paper
• 2509.16117
• Published • 23
EditVerse: Unifying Image and Video Editing and Generation with
In-Context Learning
Paper
• 2509.20360
• Published • 18
CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target
for Better Flow Matching
Paper
• 2509.19300
• Published • 7
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
Generation and Understanding
Paper
• 2510.06308
• Published • 55
Ming-UniVision: Joint Image Understanding and Generation with a Unified
Continuous Tokenizer
Paper
• 2510.06590
• Published • 77
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published • 170
Latent Diffusion Model without Variational Autoencoder
Paper
• 2510.15301
• Published • 50
WithAnyone: Towards Controllable and ID Consistent Image Generation
Paper
• 2510.14975
• Published • 87
Learning an Image Editing Model without Image Editing Pairs
Paper
• 2510.14978
• Published • 9
The Principles of Diffusion Models
Paper
• 2510.21890
• Published • 64
Thinking with Camera: A Unified Multimodal Model for Camera-Centric
Understanding and Generation
Paper
• 2510.08673
• Published • 127
From Denoising to Refining: A Corrective Framework for Vision-Language
Diffusion Model
Paper
• 2510.19871
• Published • 30
From Editor to Dense Geometry Estimator
Paper
• 2509.04338
• Published • 96
AToken: A Unified Tokenizer for Vision
Paper
• 2509.14476
• Published • 37
DoPE: Denoising Rotary Position Embedding
Paper
• 2511.09146
• Published • 98
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
• 2601.02204
• Published • 63
ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing
Paper
• 2601.03467
• Published • 7
SpotEdit: Selective Region Editing in Diffusion Transformers
Paper
• 2512.22323
• Published • 39
DreamOmni3: Scribble-based Editing and Generation
Paper
• 2512.22525
• Published • 15
LongCat-Flash-Thinking-2601 Technical Report
Paper
• 2601.16725
• Published • 180
FireRed-Image-Edit-1.0 Techinical Report
Paper
• 2602.13344
• Published • 8
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
Paper
• 2602.12205
• Published • 80
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Paper
• 2602.02437
• Published • 80