Vaani-Whisper
Collection
A collection of whisper models fine tuned using Vaani data along with other datasets • 11 items • Updated • 7
This is a fine-tuned version of OpenAI's Whisper-Large-V3, trained on approximately 718 hours of transcribed Hindi speech from multiple datasets.
This can be used with the pipeline function from the Transformers module.
import torch
from transformers import pipeline
audio = "path to the audio file to be transcribed"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
modelTags="ARTPARK-IISc/whisper-large-v3-vaani-hindi"
transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")
print('Transcription: ', transcribe(audio)["text"])
The models has finetuned using folllowing dataset Vaani ,Gramvaani IndicVoices, Fleurs,IndicTTS and Commonvoice
The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.
| Dataset | WER |
|---|---|
| Gramvaani | 25.11 |
| Fleurs | 11.20 |
| MUCS | 14.60 |
| Commonvoice | 13.84 |
| Kathbath | 08.85 |
| Kathbath Noisy | 11.80 |
| Vaani | 24.66 |
| RESPIN | 07.36 |
If you use this model, please cite the following:
@misc{pulikodan2026vaanicapturinglanguagelandscape,
title={VAANI: Capturing the language landscape for an inclusive digital India},
author={Sujith Pulikodan and Abhayjeet Singh and Agneedh Basu and Nihar Desai and Pavan Kumar J and Pranav D Bhat and Raghu Dharmaraju and Ritika Gupta and Sathvik Udupa and Saurabh Kumar and Sumit Sharma and Vaibhav Vishwakarma and Visruth Sanka and Dinesh Tewari and Harsh Dhand and Amrita Kamat and Sukhwinder Singh and Shikhar Vashishth and Partha Talukdar and Raj Acharya and Prasanta Kumar Ghosh},
year={2026},
eprint={2603.28714},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2603.28714},
}