whisper-large-v3-yue-lora-dec-enc4

Fine-tuned openai/whisper-large-v3 for Cantonese (yue) speech recognition on Common Voice.

Evaluation Results

Metric Value
CER (no punctuation) 3.28%
CER (raw) 4.03%
Eval Loss 0.0625
Best Step 36000
Best Epoch 9.08

Training History

Step Epoch Eval Loss CER (nopunct) CER (raw)
1000 0.03 0.2421 9.02% 11.76%
2000 0.05 0.2165 9.05% 11.38%
3000 0.08 0.2069 8.57% 11.01%
4000 1.01 0.1925 8.55% 10.60%
5000 1.04 0.1785 7.53% 9.69%
6000 1.06 0.1698 7.36% 9.47%
7000 1.09 0.1639 7.13% 9.23%
8000 2.02 0.1551 6.74% 8.63%
9000 2.05 0.1476 6.42% 8.43%
10000 2.07 0.1371 6.22% 7.99%
11000 2.10 0.1374 6.03% 7.91%
12000 3.03 0.1248 6.12% 7.79%
13000 3.05 0.1188 5.74% 7.34%
14000 3.08 0.1143 5.34% 6.96%
15000 4.01 0.1095 5.25% 6.60%
16000 4.04 0.1070 5.26% 6.52%
17000 4.06 0.0989 5.06% 6.28%
18000 4.09 0.0969 4.69% 5.96%
19000 5.02 0.0972 4.88% 6.03%
20000 5.04 0.0920 4.59% 5.78%
21000 5.07 0.0873 4.19% 5.22%
22000 5.10 0.0890 4.49% 5.56%
23000 6.03 0.0847 4.11% 5.18%
24000 6.05 0.0832 4.15% 5.32%
25000 6.08 0.0800 3.87% 4.91%
26000 7.01 0.0763 4.05% 4.97%
27000 7.04 0.0734 3.84% 4.64%
28000 7.06 0.0724 3.74% 4.65%
29000 7.09 0.0722 3.60% 4.53%
30000 8.02 0.0707 3.60% 4.47%
31000 8.04 0.0683 3.36% 4.17%
32000 8.07 0.0669 3.41% 4.17%
33000 8.10 0.0645 3.37% 4.19%
34000 9.03 0.0632 3.36% 4.16%
35000 9.05 0.0634 3.30% 4.10%
36000 9.08 0.0625 3.28% 4.03%

Final Evaluation

Split CER (raw) CER (nopunct)
test_yue 4.58% 4.03%
holdback_yue 5.21% 4.65%

Training Details

Training Metrics

TensorBoard logs are included in the runs/ directory of this repository.

# Clone and view locally
git clone https://huggingface.co/awong-dev/whisper-large-v3-yue-lora-dec-enc4
tensorboard --logdir whisper-large-v3-yue-lora-dec-enc4/runs

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torchaudio

processor = WhisperProcessor.from_pretrained("awong-dev/whisper-large-v3-yue-lora-dec-enc4")
model = WhisperForConditionalGeneration.from_pretrained("awong-dev/whisper-large-v3-yue-lora-dec-enc4")

# Load audio
audio, sr = torchaudio.load("audio.mp3")
if sr != 16000:
    audio = torchaudio.transforms.Resample(sr, 16000)(audio)

input_features = processor(
    audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt"
).input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Downloads last month
27
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for awong-dev/whisper-large-v3-yue-lora-dec-enc4

Finetuned
(813)
this model

Dataset used to train awong-dev/whisper-large-v3-yue-lora-dec-enc4

Evaluation results