Fongbe French ASR model w/out diacritics

How to use for inference

from speechbrain.inference.ASR import StreamingASR
from speechbrain.utils.dynamic_chunk_training import DynChunkTrainConfig

asr_model = StreamingASR.from_hparams(
    source="whettenr/asr-fon-french-streaming-conformer-without-diacritics",
    savedir="pretrained_models/asr-fon-french-streaming-conformer-without-diacritics"
)

asr_model.transcribe_file(
    "whettenr/asr-fon-without-diacritics/example_fon.wav",
    # select a chunk size of ~960ms with 4 chunks of left context
    DynChunkTrainConfig(24, 8),
    # disable torchaudio streaming to allow fetching from HuggingFace
    # set this to True for your own files or streams to allow for streaming file decoding
    use_torchaudio_streaming=False,
)

# expected output fon:
# huzuhuzu gɔngɔn ɖe ɖo dandan

Details of model

~100M parameters, 12 layer conformer encoder, Transducer (LSTM) decoder

Details of training

  • pretrained using BEST-RQ on 700 hours for 400k steps

    • 140 hours of Fongbé from:
      • FFSTC 2 + beethogedeon/fongbe-speech (~40 hours)
      • cappfm (~100 hours)
    • 140 hours of English and French (from Librispeech)
    • 140 hours of Hausa and Yoruba from VoxLingua107,CommonVoice 23.0 and BibleTTS
  • finetuned with Transducer decoder loss on training sets of

    • FFSTC 2
    • beethogedeon/fongbe-speech
    • a small portion of automatically generated transcriptions from cappfm audio
    • Sentence Piece BPE (set to 100)
    • african (https://www.openslr.org/57/) (filtered to remove data with no transcription or audio < 1 sec. Remaining was 13 hours)
# other citation coming soon

# dataset citation
@inproceedings{kponou25_interspeech,
  title     = {{Extending the Fongbe to French Speech Translation Corpus:  resources, models and benchmark}},
  author    = {D. Fortuné Kponou and Salima Mdhaffar and Fréjus A. A. Laleye and Eugène C. Ezin and Yannick Estève},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {4533--4537},
  doi       = {10.21437/Interspeech.2025-1801},
  issn      = {2958-1796},
}
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train whettenr/asr-fon-french-streaming-conformer-without-diacritics