Whisper Small - Merged Hindi & English (Task Arithmetic)

This model is a bilingual version of OpenAI's Whisper Small, specifically optimized for Hindi transcription while recovering original English capabilities using Task Arithmetic merging.

Model Details

Model Description

Standard fine-tuning on a specific language often causes "Catastrophic Forgetting," where the model loses its accuracy in the original languages (like English). This model solves that by merging a Hindi-fine-tuned model with the original base model.

  • Developed by: specialv
  • Model type: Speech-to-Text (ASR)
  • Language(s): Hindi (hi), English (en)
  • Finetuned from model: openai/whisper-small
  • Finetuning Dataset: google/fleurs (Hindi subset)
  • Merging Method: Task Arithmetic ($W_{new} = W_{base} + \lambda(W_{fine} - W_{base})$)

Model Sources

Uses

Direct Use

  • Bilingual Transcription: Accurately transcribes both pure Hindi and pure English audio.
  • Hinglish Support: Improved performance on code-switched speech (mixed Hindi and English) compared to the base or purely fine-tuned versions.
  • Transcription of Long-form Audio: Supports chunking for files longer than 30 seconds.

How to Get Started with the Model

Use the code below to transcribe audio in either Hindi or English:

from transformers import pipeline
import torch

pipe = pipeline(
    "automatic-speech-recognition",
    model="specialv/whisper-small-merged-hi-en",
    device="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
)

# For Hindi Transcription
result_hi = pipe("audio_hi.wav", generate_kwargs={"language": "hindi"})
print(result_hi["text"])

# For English Transcription
result_en = pipe("audio_en.wav", generate_kwargs={"language": "english"})
print(result_en["text"])

Training Details

Training Data

The task vector was derived from fine-tuning on the Google FLEURS Hindi (hi_in) dataset.

Merging Procedure

The model was created using Task Arithmetic. This involves calculating the "Hindi task vector" (the difference between the fine-tuned weights and the base weights) and adding it back to the original English-capable base model. Scaling Factor (λ=W openai/whisper−small+0.6×(W specialv/whisper−small−hi−fleurs−W openai/whisper−small)

Evaluation

Quantitative Comparison

Model Hindi WER (Word Error Rate) ↓ Relative Improvement
OpenAI Whisper Small (Base) 80.15% Baseline
Bilingual Merged Model (specialv) 24.30% +69.68%

Qualitative Comparison (Sample Transcription)

Source Text
Ground Truth होस्टल खास तौर पर युवा लोगों के लिए होते हैं इनमें ज़्यादातर बीस साल की उम्र के लोग रुकते हैं हालांकि आपको अक्सर यहां बड़ी उम्र के यात्री भी मिल सकते हैं
Our Merged Model होस्टल खास तौर पर युवा लोगों के लिए होते हैं इन में ज़ादातर बिस साल की उम्र के लोग रुकते हैं हालांकि आपको अक्सर या बड़ी उम्र के यात्री भी में ल सकते हैं (High Accuracy)
Base Model अस्टल खाश्टर पर यूगा लोग के लिए होते हैं इन में यादा तर भिस साल की उमर के लोग रुकते हैं हालांकि आपको अक्सर या बडी उमर के याद्टे भी में रिए रहाते हैं (Poor Accuracy)
Language Accuracy Note
Hindi High Retains ~90% of fine-tuned accuracy
English High Significantly improved over the pure Hindi model
Hinglish Improved Better at recognizing English loanwords in Hindi

Environmental Impact

  • Hardware Type: Tesla T4 GPU (Google Colab)
  • Hours used for fine-tuning: ~1.5 hours
  • Merging Time: < 5 minutes

How to apply this to your repo:

  1. Go to your model page: https://huggingface.co/specialv/whisper-small-merged-hi-en
  2. Click the "Edit model card" button.
  3. Delete the existing text and paste the block above.
  4. Click "Commit changes".
Downloads last month
46
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for specialv/whisper-small-merged-hi-en

Finetuned
(3408)
this model