cool
updated
Sparse Autoencoders Learn Monosemantic Features in Vision-Language
Models
Paper
• 2504.02821
• Published • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming
Videos
Paper
• 2504.17343
• Published • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper
• 2504.15921
• Published • 7
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper
• 2504.13263
• Published • 7
Distilling semantically aware orders for autoregressive image generation
Paper
• 2504.17069
• Published • 7
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper
• 2506.10821
• Published • 19
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video
Diffusion Models
Paper
• 2506.07177
• Published • 23
lym00/Wan2.2_T2V_A14B_VACE-test
17B • Updated • 1.65k
• 42
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
• 2509.18824
• Published • 23
SANA-Video: Efficient Video Generation with Block Linear Diffusion
Transformer
Paper
• 2509.24695
• Published • 47
DC-VideoGen: Efficient Video Generation with Deep Compression Video
Autoencoder
Paper
• 2509.25182
• Published • 39
lovis93/next-scene-qwen-image-lora-2509
Image-to-Image
• Updated • 20.2k
• • 599
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence
Reweighting
Paper
• 2510.08696
• Published • 15
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion
Control
Paper
• 2510.09561
• Published • 9
Video-to-Video
• Updated • 2.56k
• 178
Video-to-Video
• Updated • 74
meituan-longcat/LongCat-Video
Text-to-Video
• Updated • 896
• • 454
Text-to-Video
• Updated • 300
• 274
TencentARC/RollingForcing
Text-to-Video
• Updated • 15
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal
Perception and Generation
Paper
• 2510.24821
• Published • 41
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published • 229