emredeveloper commited on
Commit
9324e72
·
verified ·
1 Parent(s): 19fc62a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -19
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- language: en
3
  license: mit
4
  tags:
5
  - audio
@@ -26,15 +26,20 @@ model-index:
26
 
27
  # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
28
 
29
- This model is a fine-tuned version of the `openai/whisper-small` base model, optimized for Turkish Automatic Speech Recognition (ASR).
30
 
31
  ## Model Description
32
 
33
- Whisper models are multilingual and multitask models pre-trained on diverse audio data. This project fine-tunes the `whisper-small` model on the `Codyfederer/tr-full-dataset` to improve Turkish ASR performance.
 
 
 
 
 
34
 
35
  ## Training Data
36
 
37
- The model uses the `Codyfederer/tr-full-dataset`, consisting of 3000 Turkish audio-transcription samples, split into 90% training and 10% testing.
38
 
39
  ## Training Parameters
40
 
@@ -71,32 +76,94 @@ Training utilized the Hugging Face `Trainer` with the following `Seq2SeqTraining
71
 
72
  Test set evaluation results:
73
 
74
- - Word Error Rate (WER): 7.75%
75
- - Character Error Rate (CER): 1.95%
76
- - Loss: 0.1321
77
-
78
- ### Comparison with Base Model
79
-
80
- For an example audio file (`/content/audio.mp3`):
81
 
82
- - Base Whisper Model: WER 23.53%, CER 2.82%
83
- - Fine-Tuned Model: WER 11.76%, CER 2.11%
84
-
85
- The fine-tuned model shows significant improvement in Turkish ASR performance.
86
 
87
  ## Usage
88
 
 
89
  ```python
90
  from transformers import pipeline
91
  import torch
92
 
93
- pipeline = pipeline(
94
  task="automatic-speech-recognition",
95
  model="emredeveloper/whisper-small-tr",
96
  chunk_length_s=30,
97
  device="cuda" if torch.cuda.is_available() else "cpu",
98
  )
99
 
100
- audio_file = "path/to/your/audio.flac"
101
- text = pipeline(audio_file)["text"]
102
- print(text)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: tr
3
  license: mit
4
  tags:
5
  - audio
 
26
 
27
  # whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
28
 
29
+ This model is a fine-tuned version of `openai/whisper-small` optimized for Turkish Automatic Speech Recognition (ASR).
30
 
31
  ## Model Description
32
 
33
+ Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.
34
+
35
+ - **Base Model:** openai/whisper-small
36
+ - **Language:** Turkish (tr)
37
+ - **Task:** Automatic Speech Recognition
38
+ - **Dataset:** Codyfederer/tr-full-dataset
39
 
40
  ## Training Data
41
 
42
+ The model uses the `Codyfederer/tr-full-dataset`, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.
43
 
44
  ## Training Parameters
45
 
 
76
 
77
  Test set evaluation results:
78
 
79
+ - **Word Error Rate (WER):** 7.75%
80
+ - **Character Error Rate (CER):** 1.95%
81
+ - **Loss:** 0.1321
 
 
 
 
82
 
83
+ The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.
 
 
 
84
 
85
  ## Usage
86
 
87
+ ### Basic Usage
88
  ```python
89
  from transformers import pipeline
90
  import torch
91
 
92
+ pipe = pipeline(
93
  task="automatic-speech-recognition",
94
  model="emredeveloper/whisper-small-tr",
95
  chunk_length_s=30,
96
  device="cuda" if torch.cuda.is_available() else "cpu",
97
  )
98
 
99
+ audio_file = "path/to/your/audio.mp3"
100
+ result = pipe(audio_file)
101
+ print(result["text"])
102
+ ```
103
+
104
+ ### Gradio Demo
105
+ ```python
106
+ import gradio as gr
107
+ from transformers import pipeline
108
+
109
+ pipe = pipeline(
110
+ "automatic-speech-recognition",
111
+ model="emredeveloper/whisper-small-tr"
112
+ )
113
+
114
+ def transcribe(audio):
115
+ if audio is None:
116
+ return ""
117
+ return pipe(audio)["text"]
118
+
119
+ demo = gr.Interface(
120
+ fn=transcribe,
121
+ inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
122
+ outputs="text",
123
+ title="Turkish Speech Recognition",
124
+ description="Upload or record Turkish audio to transcribe."
125
+ )
126
+
127
+ demo.launch(share=True)
128
+ ```
129
+
130
+ ### Advanced Usage
131
+ ```python
132
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
133
+ import torch
134
+ import librosa
135
+
136
+ processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
137
+ model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")
138
+
139
+ audio, sr = librosa.load("audio.mp3", sr=16000)
140
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
141
+
142
+ predicted_ids = model.generate(input_features)
143
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
144
+
145
+ print(transcription[0])
146
+ ```
147
+
148
+ ## Limitations
149
+
150
+ - Trained on 3,000 samples, which may limit generalization
151
+ - Performance may vary on noisy audio or non-standard dialects
152
+ - Best results with clear audio at 16kHz sampling rate
153
+
154
+ ## Citation
155
+ ```bibtex
156
+ @misc{whisper-small-tr,
157
+ author = {emredeveloper},
158
+ title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
159
+ year = {2025},
160
+ publisher = {Hugging Face},
161
+ howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
162
+ }
163
+ ```
164
+
165
+ ## Acknowledgments
166
+
167
+ - Base model: [openai/whisper-small](https://huggingface.co/openai/whisper-small)
168
+ - Dataset: [Codyfederer/tr-full-dataset](https://huggingface.co/datasets/Codyfederer/tr-full-dataset)
169
+ - Built with [Hugging Face Transformers](https://github.com/huggingface/transformers)