Spaces:
Paused
Paused
metadata
title: J Moshi Arena
emoji: ๐
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
license: apache-2.0
short_description: Compare two Moshi models with tabbed interface
app_port: 7860
๐๏ธ Dual Moshi Model Interface
Compare two Moshi voice AI models side-by-side with a tabbed interface. Each model runs with full WebRTC support for real-time voice conversation.
Available Models
1. Finetuned Step 9282
- Repository:
abePclWaseda/moshi-finetuned-step-9282 - Fine-tuned Moshi model at training step 9282
- Optimized for specific use cases
- Port: 8998
2. J-Moshi (Japanese)
- Repository:
nu-dialogue/j-moshi - Japanese-optimized full-duplex spoken dialogue system
- Built on Moshi 7B with additional Japanese training data
- Supports natural turn-taking and backchannel responses (็ธๆง)
- Port: 8999
Features
- Tabbed Interface: Switch between models using tabs
- Full WebRTC Support: Complete
moshi.serverUI for each model - Dual Model Execution: Both models run simultaneously
- Real-time Voice: Full-duplex conversation with microphone input
- GPU Optimized: Designed for 48GB+ GPU environments
Architecture
The application runs:
- Main Gradio Interface (Port 7860) - Tabbed UI for model selection
- Moshi Server 1 (Port 8998) - Finetuned Step 9282 model
- Moshi Server 2 (Port 8999) - J-Moshi model
Each tab embeds the complete moshi.server interface with WebRTC support.
Requirements
- GPU: A100 (48GB) or equivalent recommended
- Memory: ~48GB GPU VRAM (24GB per model running simultaneously)
- Docker: Containerized deployment
Usage
- Open the application (port 7860)
- Click on a tab to select which model to use
- Click inside the embedded interface to interact with Moshi
- Use the microphone button to start voice conversation
- Switch tabs to compare different models
Technical Details
- Framework: Gradio + moshi.server
- Models: Moshi (7B parameters each)
- Codec: MIMI audio codec
- Ports: 7860 (main), 8998, 8999 (model servers)