Collections
Discover the best community collections!
Collections including paper arxiv:2508.19205
-
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Paper • 2412.15322 • Published • 20 -
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Paper • 2505.02707 • Published • 85 -
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Paper • 2505.02625 • Published • 23 -
Fast Text-to-Audio Generation with Adversarial Post-Training
Paper • 2505.08175 • Published • 26
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 14 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 90 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 43 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 164
-
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Paper • 2106.13914 • Published • 1 -
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Paper • 2506.15196 • Published • 3 -
Ascend HiFloat8 Format for Deep Learning
Paper • 2409.16626 • Published • 1 -
Recipes for Pre-training LLMs with MXFP8
Paper • 2506.08027 • Published • 1
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
-
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 80 -
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Paper • 2310.08588 • Published • 38 -
SALMON: Self-Alignment with Principle-Following Reward Models
Paper • 2310.05910 • Published • 2 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper • 2310.06830 • Published • 33
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 90 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 43 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 164
-
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Paper • 2106.13914 • Published • 1 -
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Paper • 2506.15196 • Published • 3 -
Ascend HiFloat8 Format for Deep Learning
Paper • 2409.16626 • Published • 1 -
Recipes for Pre-training LLMs with MXFP8
Paper • 2506.08027 • Published • 1
-
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Paper • 2412.15322 • Published • 20 -
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Paper • 2505.02707 • Published • 85 -
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Paper • 2505.02625 • Published • 23 -
Fast Text-to-Audio Generation with Adversarial Post-Training
Paper • 2505.08175 • Published • 26
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 14 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 61 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 80 -
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Paper • 2310.08588 • Published • 38 -
SALMON: Self-Alignment with Principle-Following Reward Models
Paper • 2310.05910 • Published • 2 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper • 2310.06830 • Published • 33