paper to review - a rowan-gim Collection

rowan-gim 's Collections

paper to review

paper to review

updated May 23, 2024

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 22
FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 32
Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 14
ReconFusion: 3D Reconstruction with Diffusion Priors

Paper • 2312.02981 • Published Dec 5, 2023 • 10
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

Paper • 2312.02201 • Published Dec 2, 2023 • 35
Fine-grained Controllable Video Generation via Object Appearance and Context

Paper • 2312.02919 • Published Dec 5, 2023 • 13
VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 24
MoMask: Generative Masked Modeling of 3D Human Motions

Paper • 2312.00063 • Published Nov 29, 2023 • 18
Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 67
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 34
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 34
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 62
Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 23
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
Context Tuning for Retrieval Augmented Generation

Paper • 2312.05708 • Published Dec 9, 2023 • 16
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 27
Faithful Persona-based Conversational Dataset Generation with Large Language Models

Paper • 2312.10007 • Published Dec 15, 2023 • 11
VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 22
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Paper • 2312.11461 • Published Dec 18, 2023 • 20
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Paper • 2312.11459 • Published Dec 18, 2023 • 6
VidToMe: Video Token Merging for Zero-Shot Video Editing

Paper • 2312.10656 • Published Dec 17, 2023 • 11
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 49
VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 47
Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16, 2024 • 38
Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20, 2024 • 31
SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21, 2024 • 27
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 45
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 87
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 194
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27, 2024 • 16
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 98
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 37
Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks

Paper • 2403.05185 • Published Mar 8, 2024 • 23
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

Paper • 2403.20275 • Published Mar 29, 2024 • 10
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Paper • 2404.04421 • Published Apr 5, 2024 • 18
Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11, 2024 • 15
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 116
Layer-Condensed KV Cache for Efficient Inference of Large Language Models

Paper • 2405.10637 • Published May 17, 2024 • 22