Whisper Small - Merged Hindi & English (Task Arithmetic)
This model is a bilingual version of OpenAI's Whisper Small, specifically optimized for Hindi transcription while recovering original English capabilities using Task Arithmetic merging.
Model Details
Model Description
Standard fine-tuning on a specific language often causes "Catastrophic Forgetting," where the model loses its accuracy in the original languages (like English). This model solves that by merging a Hindi-fine-tuned model with the original base model.
- Developed by: specialv
- Model type: Speech-to-Text (ASR)
- Language(s): Hindi (hi), English (en)
- Finetuned from model: openai/whisper-small
- Finetuning Dataset: google/fleurs (Hindi subset)
- Merging Method: Task Arithmetic ($W_{new} = W_{base} + \lambda(W_{fine} - W_{base})$)
Model Sources
- Base Model: openai/whisper-small
- Hindi Task Vector source: specialv/whisper-small-hi-fleurs
Uses
Direct Use
- Bilingual Transcription: Accurately transcribes both pure Hindi and pure English audio.
- Hinglish Support: Improved performance on code-switched speech (mixed Hindi and English) compared to the base or purely fine-tuned versions.
- Transcription of Long-form Audio: Supports chunking for files longer than 30 seconds.
How to Get Started with the Model
Use the code below to transcribe audio in either Hindi or English:
from transformers import pipeline
import torch
pipe = pipeline(
"automatic-speech-recognition",
model="specialv/whisper-small-merged-hi-en",
device="cuda" if torch.cuda.is_available() else "cpu",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
)
# For Hindi Transcription
result_hi = pipe("audio_hi.wav", generate_kwargs={"language": "hindi"})
print(result_hi["text"])
# For English Transcription
result_en = pipe("audio_en.wav", generate_kwargs={"language": "english"})
print(result_en["text"])
Training Details
Training Data
The task vector was derived from fine-tuning on the Google FLEURS Hindi (hi_in) dataset.
Merging Procedure
The model was created using Task Arithmetic. This involves calculating the "Hindi task vector" (the difference between the fine-tuned weights and the base weights) and adding it back to the original English-capable base model. Scaling Factor (λ=W openai/whisper−small+0.6×(W specialv/whisper−small−hi−fleurs−W openai/whisper−small)
Evaluation
Quantitative Comparison
| Model | Hindi WER (Word Error Rate) ↓ | Relative Improvement |
|---|---|---|
| OpenAI Whisper Small (Base) | 80.15% | Baseline |
| Bilingual Merged Model (specialv) | 24.30% | +69.68% |
Qualitative Comparison (Sample Transcription)
| Source | Text |
|---|---|
| Ground Truth | होस्टल खास तौर पर युवा लोगों के लिए होते हैं इनमें ज़्यादातर बीस साल की उम्र के लोग रुकते हैं हालांकि आपको अक्सर यहां बड़ी उम्र के यात्री भी मिल सकते हैं |
| Our Merged Model | होस्टल खास तौर पर युवा लोगों के लिए होते हैं इन में ज़ादातर बिस साल की उम्र के लोग रुकते हैं हालांकि आपको अक्सर या बड़ी उम्र के यात्री भी में ल सकते हैं (High Accuracy) |
| Base Model | अस्टल खाश्टर पर यूगा लोग के लिए होते हैं इन में यादा तर भिस साल की उमर के लोग रुकते हैं हालांकि आपको अक्सर या बडी उमर के याद्टे भी में रिए रहाते हैं (Poor Accuracy) |
| Language | Accuracy | Note |
|---|---|---|
| Hindi | High | Retains ~90% of fine-tuned accuracy |
| English | High | Significantly improved over the pure Hindi model |
| Hinglish | Improved | Better at recognizing English loanwords in Hindi |
Environmental Impact
- Hardware Type: Tesla T4 GPU (Google Colab)
- Hours used for fine-tuning: ~1.5 hours
- Merging Time: < 5 minutes
How to apply this to your repo:
- Go to your model page:
https://huggingface.co/specialv/whisper-small-merged-hi-en - Click the "Edit model card" button.
- Delete the existing text and paste the block above.
- Click "Commit changes".
- Downloads last month
- 46
Model tree for specialv/whisper-small-merged-hi-en
Base model
openai/whisper-small