ethananhtran 's Collections Read But Not Implemented
updated
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper
• 2512.16093
• Published • 97
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
• 2511.22699
• Published • 245
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published • 222
Sharp Monocular View Synthesis in Less Than a Second
Paper
• 2512.10685
• Published • 29
Latent Implicit Visual Reasoning
Paper
• 2512.21218
• Published • 70
SemanticGen: Video Generation in Semantic Space
Paper
• 2512.20619
• Published • 94
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper
• 2512.04677
• Published • 177
Spatia: Video Generation with Updatable Spatial Memory
Paper
• 2512.15716
• Published • 34
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper
• 2512.19693
• Published • 67
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper
• 2511.14993
• Published • 233
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper
• 2512.11253
• Published • 40
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published • 170
Paper
• 2412.18653
• Published • 86
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
Paper
• 2512.17504
• Published • 99
ProEdit: Inversion-based Editing From Prompts Done Right
Paper
• 2512.22118
• Published • 18
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper
• 2511.22677
• Published • 35
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
Paper
• 2512.16900
• Published • 11
StoryMem: Multi-shot Long Video Storytelling with Memory
Paper
• 2512.19539
• Published • 19
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper
• 2512.23576
• Published • 66
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
• 2512.24618
• Published • 154
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Paper
• 2512.23709
• Published • 51
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published • 321
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
• 2512.23959
• Published • 111
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Paper
• 2601.00664
• Published • 57
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
• 2512.20578
• Published • 86
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Paper
• 2601.03252
• Published • 104
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
• 2601.02151
• Published • 113
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published • 230
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
Paper
• 2601.04890
• Published • 44
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper
• 2601.03233
• Published • 176
MMFormalizer: Multimodal Autoformalization in the Wild
Paper
• 2601.03017
• Published • 106
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
• 2601.07348
• Published • 116
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
• 2601.08763
• Published • 150
VIBE: Visual Instruction Based Editor
Paper
• 2601.02242
• Published • 64
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
• 2601.08808
• Published • 39
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey
Paper
• 2601.11655
• Published • 62
LongCat-Flash-Thinking-2601 Technical Report
Paper
• 2601.16725
• Published • 180