Shona F5-TTS Voice

Shona F5-TTS Voice is a Shona (sna) text-to-speech model built on top of SWivid/F5-TTS. This repository replaces an earlier checkpoint with the stronger 150-sample full fine-tune that performed best by listening evaluation against other tested Shona TTS systems.

Model Details

Author: Manasseh Changachirere (Harare Institute of Technology)
Base model: SWivid/F5-TTS
Phase 1 training dataset: Shekharmeena/Shona-Male-Audio-Dataset
Phase 2 training dataset: manassehzw/sna-manasseh-150-raw
Language: Shona
Model family: F5-TTS
Prepared training rows: 150
Prepared training duration: 0.2483 hours
Configured epochs: 50
Learning rate: 1.50e-05
Run started: 2026-05-07T18:55:28.784481+00:00
Run finished: 2026-05-07T19:09:15.967322+00:00
Phase 1 dataset: Shekharmeena/Shona-Male-Audio-Dataset
Phase 2 dataset: manassehzw/sna-manasseh-150-raw

Overview

Two-stage adapted from the F5-TTS base model for Shona speech synthesis. The winning release first adapts the base model on a broader Shona male speech corpus, then applies a LoRA identity pass on a curated 150-sample single-speaker Shona dataset. This package includes the validated final checkpoint, tokenizer vocabulary, training metadata, and held-out evaluation samples for research and downstream voice application testing.

Training Pipeline

Phase 1: full adaptation of the F5-TTS base model on the broader Shona dataset.
Phase 2: LoRA identity adaptation on the 150-sample single-speaker dataset to improve speaker similarity, stability, and robustness.

Files

model.pt: full compatibility checkpoint exported from the validated run
model.safetensors: inference-oriented weight export
vocab.txt: tokenizer vocabulary used for training and inference
research/train_config.yaml: generated training configuration for this run
research/summary.json: run summary and artifact paths
research/prep_summary.json: prepared dataset summary
samples/: held-out evaluation generations when included (final_eval_ref01_nfe32)

Intended Use

This model is intended for:

Shona TTS research
voice agent prototyping
single-speaker adaptation experiments
comparative benchmarking against Spark-TTS and other Shona TTS systems

It is not positioned as a production-hardened commercial speech API.

Compatibility

This repository does not follow the standard transformers text-to-speech layout. It is intended for the F5-TTS / sna-f5-tts inference stack used in this project.

Inference Notes

This checkpoint works best with a short, clean reference clip and accurate reference text.
Long-form synthesis is still best handled by chunking.
Faster inference is possible by lowering NFE steps, with some quality tradeoff.

Samples

The table below points at the uploaded evaluation WAVs. Inline audio players are included with direct links as a fallback.

File	Text	Audio
`sample_01.wav`	Mangwanani shamwari yangu, ndafara kukuwona nhasi. Ndaida kukubvunza kuti zvinhu zvirisei kubasa uye mhuri yakasimba here, tinogona kusangana manheru here? tigoronga svondo nemafaro patinenge tapedza kunamata.	WAV
`sample_02.wav`	Nezuro ndakaenda kumusika mangwanani, ndikawana miriwo, madomasi, nehanyanisi, asi mitengo yacho yakanga yakwira zvishoma. Ndakazobika sadza nenyama kumba, vana vakati chikafu chainaka kwazvo.	WAV
`sample_03.wav`	Kana uchida kuti chirongwa ichi chifambe zvakanaka, tinofanira kutanga taronga kupimana kwebasa, topatsanura nguva yekudzidza, tobva taziva zvinotarisirwa pakupera kwemwedzi. Kana tikashanda pamwe chete, tinokwanisa kusvika pazvinangwa zvedu.	WAV
`sample_04.wav`	Mumugwagwa mune motokari dzakawanda nhasi, ndozvaiita kuti ndinonoke kusvika, ndaedza kutsvaga imwe nzira iriclear ndokuzosvika. Dzimwe nguva kufamba muguta kunotoda patience nekuti pa peak hour munenge makazara.	WAV
`sample_05.wav`	Mamukasei, mhuri yakadini, makazofamba mushe here takazorasana paye. Ini ndakasvika zvakanaka chose. Mugokwazisa baba vaLinda ne rimwe team rese.	WAV

Training Provenance

Base model: SWivid/F5-TTS
Phase 1 dataset: Shekharmeena/Shona-Male-Audio-Dataset
Phase 2 dataset: manassehzw/sna-manasseh-150-raw
Checkpoint path used for publication: /root/project/runs/shona_f5_tts_identity/shona_f5_tts_identity-m150-n150-lora-lr2em5-ep50-May-07-2026_06+55PM-2fecae2/final_model/model_last.pt
Research metadata: available under research/

Limitations

This is a research checkpoint and may still vary with prompt/reference mismatch.
Code-switching performance can depend heavily on how much multilingual material was present in the fine-tuning data.
Live conversational use may still need chunked delivery or optimized runtime serving for best latency.

Citation

If you use this model, please also credit the upstream F5-TTS project:

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for manassehzw/sna-f5-tts

Base model

SWivid/F5-TTS

Finetuned

(131)

this model

manassehzw
/

sna-f5-tts