Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.19205

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164
Running on Zero

784

IndexTTS 2 Demo

🏢

784

Generate expressive speech audio from text with emotion control

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 20
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5, 2025 • 85
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5, 2025 • 23
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13, 2025 • 26

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 14
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 61
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2, 2025 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29, 2025 • 7

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

Bugai's Collection

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 90
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164

LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Paper • 2106.13914 • Published Jun 26, 2021 • 1
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Paper • 2506.15196 • Published Jun 18, 2025 • 3
Ascend HiFloat8 Format for Deep Learning

Paper • 2409.16626 • Published Sep 25, 2024 • 1
Recipes for Pre-training LLMs with MXFP8

Paper • 2506.08027 • Published May 30, 2025 • 1

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

CodePlan: Repository-level Coding using LLMs and Planning

Paper • 2309.12499 • Published Sep 21, 2023 • 80
Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 38
SALMON: Self-Alignment with Principle-Following Reward Models

Paper • 2310.05910 • Published Oct 9, 2023 • 2
Lemur: Harmonizing Natural Language and Code for Language Agents

Paper • 2310.06830 • Published Oct 10, 2023 • 33

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164

Bugai's Collection

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 90
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 164
Running on Zero

784

IndexTTS 2 Demo

🏢

784

Generate expressive speech audio from text with emotion control

LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Paper • 2106.13914 • Published Jun 26, 2021 • 1
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Paper • 2506.15196 • Published Jun 18, 2025 • 3
Ascend HiFloat8 Format for Deep Learning

Paper • 2409.16626 • Published Sep 25, 2024 • 1
Recipes for Pre-training LLMs with MXFP8

Paper • 2506.08027 • Published May 30, 2025 • 1

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 20
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5, 2025 • 85
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5, 2025 • 23
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13, 2025 • 26

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 14
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 61
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2, 2025 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29, 2025 • 7

CodePlan: Repository-level Coding using LLMs and Planning

Paper • 2309.12499 • Published Sep 21, 2023 • 80
Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 38
SALMON: Self-Alignment with Principle-Following Reward Models

Paper • 2310.05910 • Published Oct 9, 2023 • 2
Lemur: Harmonizing Natural Language and Code for Language Agents

Paper • 2310.06830 • Published Oct 10, 2023 • 33

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs