Whisper Small Uyghur LoRA (Fine-tuned)
چۈشەندۈرۈشى (Description in Uyghur)
openai/whisper-small ئاساسىدا ئۇيغۇرچە ئاۋازنى تونۇش ئۈچۈن مەخسۇس تەربىيەلەنگەن. بىز LoRA تېخنىكىسىنى ئىشلىتىپ، ئۇيغۇرچە ئاۋازلارنى يۇقىرى ئېنىقلىقتا تېكىستكە ئايلاندۇرۇش مەقسىتىگە يەتتۇق.
- تەربىيەلەش سانلىق مەلۇماتى: Mozilla Common Voice (Uyghur)
- قاتتىق دېتال: NVIDIA GeForce RTX 3060 (9 سائەت تەربىيەلەنگەن)
- مەقسەت: ئۇيغۇر تىلىنىڭ رەقەملىك ساھەدىكى تەرەققىياتى ۋە تىلنى قوغداشقا تۆھپە قوشۇش.
Model Description (English)
This model is a fine-tuned version of OpenAI Whisper Small for Uyghur Speech Recognition (ASR). It was trained using LoRA (Low-Rank Adaptation), resulting in a lightweight but highly accurate adapter (approx. 13MB).
- Data Source: Mozilla Common Voice / Data Collective
- Hardware: Trained on a single NVIDIA RTX 3060 GPU for approximately 9 hours.
- Accuracy: Fine-tuned to achieve high precision in recognizing Uyghur spoken language.
⚙️ Training Details
- Base Model:
openai/whisper-small - Method: PEFT (LoRA)
- Training Time: ~9 hours
- Optimizer: AdamW
- Adapter Size: ~13.5 MB
⚠️ Disclaimer (ئاگاھلاندۇرۇش)
English: This model is released for research, educational, and language preservation purposes only. The developer strongly opposes the use of this technology for mass surveillance, human rights violations, or any form of discrimination.
ئۇيغۇرچە: بۇ مودېل پەقەت تەتقىقات، مائارىپ ۋە تىلنى قوغداش مەقسىتىدە ئېلان قىلىندى. بۇ تېخنىكىنى كۆزىتىش، كىشىلىك ھوقۇققا دەخلى-تەرۇز قىلىش ياكى كەمسىتىش خاراكتېرلىك ئىشلارغا ئىشلىتىشكە قەتئىي قارشى تۇرىمىز.
How to use
You can load this model using PEFT and Transformers. Since the processor is not included in this adapter-only repo, please load the processor from the base model.
import torch
import librosa
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
# 1. Setup Model IDs
base_model_id = "openai/whisper-small"
peft_model_id = "xiwol/whisper-small-uyghur"
# 2. Load Processor from the base model
# Note: We specify language and task for Uyghur ASR
processor = WhisperProcessor.from_pretrained(base_model_id, language="uyghur", task="transcribe")
# 3. Load Base Model
base_model = WhisperForConditionalGeneration.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.float16
)
# 4. Load the LoRA Adapter from Hugging Face
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.eval()
# 5. Inference Example
# Load your audio file (ensure 16kHz sampling rate)
# audio, _ = librosa.load("your_audio_file.mp3", sr=16000)
# input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features.to("cuda").half()
# Generate Transcription
# with torch.no_grad():
# predicted_ids = model.generate(input_features)
# transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
# print(f"Transcription: {transcription}")
- Downloads last month
- 37
Model tree for xiwol/whisper-small-uyghur
Base model
openai/whisper-small