You need to abide by Terms of Use to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Request access
This model is released for non-commercial research and educational purposes only.
By requesting access, you agree to:
- Abide by the CSYE Terms of Use and USC Shoah Foundation Terms of Use
- Properly cite our research paper from the HTRes-2026 workshop [Link coming soon]
We strongly recommend verifying all outputs against original audio, especially when working with sensitive recordings.
Log in or Sign Up to review the conditions and access this model content.
Wav2Vec2-BERT for Northeastern Yiddish ASR (Phonemic Orthography)
This model is a version of Wav2Vec-BERT 2.0 fine-tuned on a subset of the
Corpus of Spoken Yiddish in Europe (CSYE) for
automatic speech recognition in Northeastern Yiddish. The model outputs a
phonemic representation of Yiddish using a Hebrew-based orthography in
precomposed Unicode. This output can be respelled in standard Yiddish by
transliterating and then detransliterating the text with the
yiddish package.
This is the PHON-44 model from: Bleaman, Isaac L. 2026. Automatic Transcription of Holocaust Testimonies in Yiddish: Orthographic Comparison and Cross-Domain Validation. Proceedings of the Second Workshop on Holocaust Testimonies as Language Resources (HTRes-2026). [Link coming soon.]
Description
- Base model: facebook/w2v-bert-2.0
- Orthography: Phonemic Hebrew-based script in precomposed Unicode
- Training data: 30.83 hours from 42 Northeastern Yiddish speakers from CSYE
- Training seed: 44 (lowest WER of 5 random seeds tested on CSYE)
Performance
In-domain (CSYE, Holocaust testimonies)
13,111 segments from 12 unseen speakers
- WER: 37.22%
- CER: 12.81%
Cross-domain (REYD, audiobooks)
3,632 utterances from 2 narrators
- WER: 24.32%
- CER: 5.88%
Terms of Use
This model is fine-tuned on transcribed Holocaust survivor testimonies from the CSYE, sourced from the USC Shoah Foundation Visual History Archive. It may only be used for non-commercial research and educational purposes, including Holocaust testimony preservation and accessibility, consistent with the CSYE Terms of Use and the USC Shoah Foundation Terms of Use. Users must request access to the ASR model using the form above.
Citation
If you use this model, please cite the HTRes paper mentioned above.
Research Support
This material is based upon work supported by the National Science Foundation under Award No. BCS-2142797. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
- Downloads last month
- 49
Model tree for ibleaman/w2v-bert-2.0-yiddish-northeastern
Base model
facebook/w2v-bert-2.0