J-Moshi-Arena / README.md
yutoAb
Add application file
4ee8d5c
|
raw
history blame
2.1 kB
metadata
title: J Moshi Arena
emoji: ๐Ÿ“Š
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
license: apache-2.0
short_description: Compare two Moshi models with tabbed interface
app_port: 7860

๐ŸŽ™๏ธ Dual Moshi Model Interface

Compare two Moshi voice AI models side-by-side with a tabbed interface. Each model runs with full WebRTC support for real-time voice conversation.

Available Models

1. Finetuned Step 9282

  • Repository: abePclWaseda/moshi-finetuned-step-9282
  • Fine-tuned Moshi model at training step 9282
  • Optimized for specific use cases
  • Port: 8998

2. J-Moshi (Japanese)

  • Repository: nu-dialogue/j-moshi
  • Japanese-optimized full-duplex spoken dialogue system
  • Built on Moshi 7B with additional Japanese training data
  • Supports natural turn-taking and backchannel responses (็›ธๆงŒ)
  • Port: 8999

Features

  • Tabbed Interface: Switch between models using tabs
  • Full WebRTC Support: Complete moshi.server UI for each model
  • Dual Model Execution: Both models run simultaneously
  • Real-time Voice: Full-duplex conversation with microphone input
  • GPU Optimized: Designed for 48GB+ GPU environments

Architecture

The application runs:

  1. Main Gradio Interface (Port 7860) - Tabbed UI for model selection
  2. Moshi Server 1 (Port 8998) - Finetuned Step 9282 model
  3. Moshi Server 2 (Port 8999) - J-Moshi model

Each tab embeds the complete moshi.server interface with WebRTC support.

Requirements

  • GPU: A100 (48GB) or equivalent recommended
  • Memory: ~48GB GPU VRAM (24GB per model running simultaneously)
  • Docker: Containerized deployment

Usage

  1. Open the application (port 7860)
  2. Click on a tab to select which model to use
  3. Click inside the embedded interface to interact with Moshi
  4. Use the microphone button to start voice conversation
  5. Switch tabs to compare different models

Technical Details

  • Framework: Gradio + moshi.server
  • Models: Moshi (7B parameters each)
  • Codec: MIMI audio codec
  • Ports: 7860 (main), 8998, 8999 (model servers)