You need to abide by Terms of Use to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Request access

This model is released for non-commercial research and educational purposes only.

By requesting access, you agree to:

We strongly recommend verifying all outputs against original audio, especially when working with sensitive recordings.

Log in or Sign Up to review the conditions and access this model content.

Wav2Vec2-BERT for Northeastern Yiddish ASR (Phonemic Orthography)

This model is a version of Wav2Vec-BERT 2.0 fine-tuned on a subset of the Corpus of Spoken Yiddish in Europe (CSYE) for automatic speech recognition in Northeastern Yiddish. The model outputs a phonemic representation of Yiddish using a Hebrew-based orthography in precomposed Unicode. This output can be respelled in standard Yiddish by transliterating and then detransliterating the text with the yiddish package.

This is the PHON-44 model from: Bleaman, Isaac L. 2026. Automatic Transcription of Holocaust Testimonies in Yiddish: Orthographic Comparison and Cross-Domain Validation. Proceedings of the Second Workshop on Holocaust Testimonies as Language Resources (HTRes-2026). [Link coming soon.]

Description

  • Base model: facebook/w2v-bert-2.0
  • Orthography: Phonemic Hebrew-based script in precomposed Unicode
  • Training data: 30.83 hours from 42 Northeastern Yiddish speakers from CSYE
  • Training seed: 44 (lowest WER of 5 random seeds tested on CSYE)

Performance

In-domain (CSYE, Holocaust testimonies)

13,111 segments from 12 unseen speakers

  • WER: 37.22%
  • CER: 12.81%

Cross-domain (REYD, audiobooks)

3,632 utterances from 2 narrators

  • WER: 24.32%
  • CER: 5.88%

Terms of Use

This model is fine-tuned on transcribed Holocaust survivor testimonies from the CSYE, sourced from the USC Shoah Foundation Visual History Archive. It may only be used for non-commercial research and educational purposes, including Holocaust testimony preservation and accessibility, consistent with the CSYE Terms of Use and the USC Shoah Foundation Terms of Use. Users must request access to the ASR model using the form above.

Citation

If you use this model, please cite the HTRes paper mentioned above.

Research Support

This material is based upon work supported by the National Science Foundation under Award No. BCS-2142797. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Downloads last month
49
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ibleaman/w2v-bert-2.0-yiddish-northeastern

Finetuned
(464)
this model