Magpie TTS Multilingual 357M — Core ML

Core ML port of NVIDIA's Magpie TTS Multilingual 357M, packaged for on-device inference on Apple Silicon (iPhone, iPad, Mac).

This repository contains only the converted Core ML model artifacts. Model weights, architecture, and training are entirely NVIDIA's work — all credit for the underlying model goes to the Magpie TTS team. This port adds only the Core ML conversion, iOS/macOS runtime integration, and packaging.

What's included

File	Role
`TextEncoder.mlmodelc`	Text → encoder hidden states
`DecoderPrefill.mlmodelc`	Batched speaker-context prefill (populates KV cache in one pass)
`DecoderStep.mlmodelc`	Single autoregressive step with explicit KV cache I/O
`NanocodecDecoder.mlmodelc`	Codec tokens → 22 kHz waveform

All four are compiled .mlmodelc bundles, ready to load via MLModel(contentsOf:). FP16 weights, minimum_deployment_target = iOS 17.

Languages

English, Spanish, German, Mandarin, French, Italian, Vietnamese, Hindi, Japanese (9 languages, matching the NVIDIA original).

Model details

Base architecture: 12-layer causal decoder, d_model = 768, 12 self-attention heads, d_head = 64
8 audio codebooks, 2016 codes + 8 special tokens each
Local transformer: 1-layer causal, d = 256, samples codebooks autoregressively per frame
Max text length: 256 tokens, max decoder sequence: 512
Output sample rate: 22 kHz

Compute-unit guidance

Tested on iPhone 15 Pro and M-series Macs:

Model	Recommended compute unit	Notes
`TextEncoder`	`.cpuAndNeuralEngine`	ANE-friendly
`DecoderPrefill`	`.cpuAndNeuralEngine`	Batched, benefits from ANE
`DecoderStep`	`.cpuOnly`	Weight-bandwidth bound; CPU matches GPU on Apple Silicon unified memory and avoids per-step GPU dispatch overhead. Also background-safe (Metal is suspended in background).
`NanocodecDecoder`	`.cpuOnly`	Contains ops/dimensions that exceed ANE limits; CPU beats GPU here too.

License & Attribution

This port inherits the license of the base model from NVIDIA. See the original NVIDIA model card for terms.

The model weights, architecture, and training are NVIDIA's work. This repository provides only a Core ML packaging. Please cite and credit the NVIDIA Magpie TTS team for any use of the underlying model.