Magpie TTS Multilingual 357M β Core ML
Core ML port of NVIDIA's Magpie TTS Multilingual 357M, packaged for on-device inference on Apple Silicon (iPhone, iPad, Mac).
This repository contains only the converted Core ML model artifacts. Model weights, architecture, and training are entirely NVIDIA's work β all credit for the underlying model goes to the Magpie TTS team. This port adds only the Core ML conversion, iOS/macOS runtime integration, and packaging.
What's included
| File | Role |
|---|---|
TextEncoder.mlmodelc |
Text β encoder hidden states |
DecoderPrefill.mlmodelc |
Batched speaker-context prefill (populates KV cache in one pass) |
DecoderStep.mlmodelc |
Single autoregressive step with explicit KV cache I/O |
NanocodecDecoder.mlmodelc |
Codec tokens β 22 kHz waveform |
All four are compiled .mlmodelc bundles, ready to load via MLModel(contentsOf:).
FP16 weights, minimum_deployment_target = iOS 17.
Languages
English, Spanish, German, Mandarin, French, Italian, Vietnamese, Hindi, Japanese (9 languages, matching the NVIDIA original).
Model details
- Base architecture: 12-layer causal decoder,
d_model = 768, 12 self-attention heads,d_head = 64 - 8 audio codebooks, 2016 codes + 8 special tokens each
- Local transformer: 1-layer causal,
d = 256, samples codebooks autoregressively per frame - Max text length: 256 tokens, max decoder sequence: 512
- Output sample rate: 22 kHz
Compute-unit guidance
Tested on iPhone 15 Pro and M-series Macs:
| Model | Recommended compute unit | Notes |
|---|---|---|
TextEncoder |
.cpuAndNeuralEngine |
ANE-friendly |
DecoderPrefill |
.cpuAndNeuralEngine |
Batched, benefits from ANE |
DecoderStep |
.cpuOnly |
Weight-bandwidth bound; CPU matches GPU on Apple Silicon unified memory and avoids per-step GPU dispatch overhead. Also background-safe (Metal is suspended in background). |
NanocodecDecoder |
.cpuOnly |
Contains ops/dimensions that exceed ANE limits; CPU beats GPU here too. |
License & Attribution
This port inherits the license of the base model from NVIDIA. See the original NVIDIA model card for terms.
The model weights, architecture, and training are NVIDIA's work. This repository provides only a Core ML packaging. Please cite and credit the NVIDIA Magpie TTS team for any use of the underlying model.
Links
- Original model: https://huggingface.co/nvidia/magpie-tts-multilingual