Scaling Speech Technology to 1,000+ Languages
Paper โข 2305.13516 โข Published โข 12
Fine-tuned version of facebook/mms-tts-las
on the Eyaa-Tom dataset for Lamba (Togo) (las).
Lamba (Togo). Fine-tuned from facebook/mms-tts-las.
| Field | Value |
|---|---|
| Language | Lamba (Togo) |
| ISO 639-3 (MMS) | las |
| Your ISO | las |
| Region | Togo |
| Family | Gur (Niger-Congo) |
| Base model | facebook/mms-tts-las |
| Metric | Value |
|---|---|
| Training samples | 45 |
| Validation samples | 8 |
| Best validation mel-L1 | 3.6687 |
| Uploaded variant | best |
from transformers import VitsModel, VitsTokenizer
import torch, torchaudio
model = VitsModel.from_pretrained("Umbaji001/eyaa-tom-mms-tts-las")
tokenizer = VitsTokenizer.from_pretrained("Umbaji001/eyaa-tom-mms-tts-las")
inputs = tokenizer("your text here", return_tensors="pt")
with torch.no_grad():
waveform = model(**inputs).waveform[0]
torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate)
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel et al.},
journal={arXiv preprint arXiv:2305.13516},
year={2023}
}
Fine-tuned: 2026-02-25 โ Eyaa-Tom project
Base model
facebook/mms-tts-las