paper to review
updated
VideoSwap: Customized Video Subject Swapping with Interactive Semantic
Point Correspondence
Paper
• 2312.02087
• Published • 22
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published • 32
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
• 2312.02432
• Published • 14
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper
• 2312.02981
• Published • 10
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
Paper
• 2312.02201
• Published • 35
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
• 2312.02919
• Published • 13
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
• 2312.00777
• Published • 24
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
• 2312.00063
• Published • 18
Make Pixels Dance: High-Dynamic Video Generation
Paper
• 2311.10982
• Published • 67
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
• 2312.03491
• Published • 34
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
• 2312.03818
• Published • 34
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
• 2312.03793
• Published • 18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
• 2312.04461
• Published • 62
Controllable Human-Object Interaction Synthesis
Paper
• 2312.03913
• Published • 23
Photorealistic Video Generation with Diffusion Models
Paper
• 2312.06662
• Published • 24
Context Tuning for Retrieval Augmented Generation
Paper
• 2312.05708
• Published • 16
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
• 2312.09911
• Published • 55
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
• 2312.09767
• Published • 27
Faithful Persona-based Conversational Dataset Generation with Large
Language Models
Paper
• 2312.10007
• Published • 11
VecFusion: Vector Font Generation with Diffusion
Paper
• 2312.10540
• Published • 22
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper
• 2312.11461
• Published • 20
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient
Volumetric Encoder
Paper
• 2312.11459
• Published • 6
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
• 2312.10656
• Published • 11
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 49
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper
• 2312.14125
• Published • 47
Scalable Pre-training of Large Autoregressive Image Models
Paper
• 2401.08541
• Published • 38
Aria Everyday Activities Dataset
Paper
• 2402.13349
• Published • 31
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper
• 2402.13929
• Published • 27
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published • 45
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published • 87
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
• 2402.17485
• Published • 194
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
• 2402.17723
• Published • 16
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
• 2403.03163
• Published • 98
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
Diffusion Models
Paper
• 2403.03100
• Published • 37
Personalized Audiobook Recommendations at Spotify Through Graph Neural
Networks
Paper
• 2403.05185
• Published • 23
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published • 72
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for
Reconstructing Challenging Surfaces
Paper
• 2403.20275
• Published • 10
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual
Observations
Paper
• 2404.04421
• Published • 18
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper
• 2404.07616
• Published • 15
KAN: Kolmogorov-Arnold Networks
Paper
• 2404.19756
• Published • 116
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Paper
• 2405.10637
• Published • 22