emredeveloper
/

whisper-small-tr

@@ -1,5 +1,5 @@
 ---
-language: en
 license: mit
 tags:
 - audio
@@ -26,15 +26,20 @@ model-index:
 # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
-This model is a fine-tuned version of the `openai/whisper-small` base model, optimized for Turkish Automatic Speech Recognition (ASR).
 ## Model Description
-Whisper models are multilingual and multitask models pre-trained on diverse audio data. This project fine-tunes the `whisper-small` model on the `Codyfederer/tr-full-dataset` to improve Turkish ASR performance.
 ## Training Data
-The model uses the `Codyfederer/tr-full-dataset`, consisting of 3000 Turkish audio-transcription samples, split into 90% training and 10% testing.
 ## Training Parameters
@@ -71,32 +76,94 @@ Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTraining
 Test set evaluation results:
-- Word Error Rate (WER): 7.75%
-- Character Error Rate (CER): 1.95%
-- Loss: 0.1321
-### Comparison with Base Model
-For an example audio file (`/content/audio.mp3`):
-- Base Whisper Model: WER 23.53%, CER 2.82%
-- Fine-Tuned Model: WER 11.76%, CER 2.11%
-The fine-tuned model shows significant improvement in Turkish ASR performance.
 ## Usage
 ```python
 from transformers import pipeline
 import torch
-pipeline = pipeline(
     task="automatic-speech-recognition",
     model="emredeveloper/whisper-small-tr",
     chunk_length_s=30,
     device="cuda" if torch.cuda.is_available() else "cpu",
 )
-audio_file = "path/to/your/audio.flac"
-text = pipeline(audio_file)["text"]
-print(text)

 ---
+language: tr
 license: mit
 tags:
 - audio
 # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
+This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR).
 ## Model Description
+Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.
+- **Base Model:** openai/whisper-small
+- **Language:** Turkish (tr)
+- **Task:** Automatic Speech Recognition
+- **Dataset:** Codyfederer/tr-full-dataset
 ## Training Data
+The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.
 ## Training Parameters
 Test set evaluation results:
+- **Word Error Rate (WER):** 7.75%
+- **Character Error Rate (CER):** 1.95%
+- **Loss:** 0.1321
+The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.
 ## Usage
+### Basic Usage
 ```python
 from transformers import pipeline
 import torch
+pipe = pipeline(
     task="automatic-speech-recognition",
     model="emredeveloper/whisper-small-tr",
     chunk_length_s=30,
     device="cuda" if torch.cuda.is_available() else "cpu",
 )
+audio_file = "path/to/your/audio.mp3"
+result = pipe(audio_file)
+print(result["text"])
+```
+### Gradio Demo
+```python
+import gradio as gr
+from transformers import pipeline
+pipe = pipeline(
+    "automatic-speech-recognition",
+    model="emredeveloper/whisper-small-tr"
+)
+def transcribe(audio):
+    if audio is None:
+        return ""
+    return pipe(audio)["text"]
+demo = gr.Interface(
+    fn=transcribe,
+    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
+    outputs="text",
+    title="Turkish Speech Recognition",
+    description="Upload or record Turkish audio to transcribe."
+)
+demo.launch(share=True)
+```
+### Advanced Usage
+```python
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+import torch
+import librosa
+processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
+model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")
+audio, sr = librosa.load("audio.mp3", sr=16000)
+input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
+predicted_ids = model.generate(input_features)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+print(transcription[0])
+```
+## Limitations
+- Trained on 3,000 samples, which may limit generalization
+- Performance may vary on noisy audio or non-standard dialects
+- Best results with clear audio at 16kHz sampling rate
+## Citation
+```bibtex
+@misc{whisper-small-tr,
+  author = {emredeveloper},
+  title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
+}
+```
+## Acknowledgments
+- Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
+- Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset)
+- Built with [Hugging Face Transformers](https://github.com/huggingface/transformers)