junnei commited on
Commit
813fca0
·
verified ·
1 Parent(s): d69522c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -39
README.md CHANGED
@@ -1,54 +1,78 @@
1
  ---
2
  library_name: transformers
3
- tags:
4
- - generated_from_trainer
 
 
 
 
 
 
 
 
 
5
  model-index:
6
- - name: Phi-4-multimodal-instruct-ko-speech
7
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
- should probably proofread and complete it, then remove this comment. -->
12
 
13
- # Phi-4-multimodal-instruct-ko-speech
14
 
15
- This model was trained from scratch on an unknown dataset.
16
 
17
- ## Model description
18
 
19
- More information needed
20
 
21
- ## Intended uses & limitations
22
 
23
- More information needed
 
 
24
 
25
- ## Training and evaluation data
26
 
27
- More information needed
28
 
29
- ## Training procedure
 
 
 
 
 
 
 
30
 
31
- ### Training hyperparameters
32
-
33
- The following hyperparameters were used during training:
34
- - learning_rate: 4e-05
35
- - train_batch_size: 32
36
- - eval_batch_size: 8
37
- - seed: 42
38
- - gradient_accumulation_steps: 4
39
- - total_train_batch_size: 128
40
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-07 and optimizer_args=No additional optimizer arguments
41
- - lr_scheduler_type: linear
42
- - lr_scheduler_warmup_steps: 50
43
- - num_epochs: 2
44
-
45
- ### Training results
46
-
47
-
48
-
49
- ### Framework versions
50
-
51
- - Transformers 4.48.2
52
- - Pytorch 2.6.0+cu124
53
- - Datasets 3.3.2
54
- - Tokenizers 0.21.0
 
1
  ---
2
  library_name: transformers
3
+ datasets:
4
+ - Bingsu/zeroth-korean
5
+ - google/fleurs
6
+ language:
7
+ - ko
8
+ metrics:
9
+ - cer
10
+ - wer
11
+ - bleu
12
+ base_model:
13
+ - microsoft/Phi-4-multimodal-instruct
14
  model-index:
15
+ - name: Phi-4-multimodal-instruct-ko-asr
16
+ results:
17
+ - task:
18
+ type: automatic-speech-recognition
19
+ dataset:
20
+ type: Bingsu/zeroth_korean
21
+ name: zeroth-korean-test
22
+ metrics:
23
+ - type: bleu
24
+ name: zeroth-test-BLEU
25
+ value: 94.837
26
+ - type: cer
27
+ name: zeroth-test-CER
28
+ value: 1.316
29
+ - type: wer
30
+ name: zeroth-test-WER
31
+ value: 2.951
32
+ - task:
33
+ type: automatic-speech-recognition
34
+ dataset:
35
+ type: google/flerus
36
+ name: flerus-ko-test
37
+ metrics:
38
+ - type: bleu
39
+ name: fleurs-test-BLEU
40
+ value: 67.659
41
+ - type: cer
42
+ name: fleurs-test-CER
43
+ value: 7.951
44
+ - type: wer
45
+ name: fleurs-test-WER
46
+ value: 18.313
47
+ pipeline_tag: automatic-speech-recognition
48
  ---
49
 
 
 
50
 
 
51
 
52
+ This model is fine-tuned from [microsoft/Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) on [Bingsu/zeroth-korean](https://huggingface.co/datasets/Bingsu/zeroth-korean), [google/flerus](https://huggingface.co/datasets/Bingsu/google/flerus) in 5 epochs.
53
 
54
+ This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.
55
 
56
+ After that, we continue training with [CoVoST2 Dataset](https://huggingface.co/datasets/junnei/covost2) / [Only for Korean](https://huggingface.co/datasets/junnei/covost2-ko) for AST.
57
 
58
+ ## Evaluation
59
 
60
+ Evaluation was done on the following datasets:
61
+ - ASR (Automatic Speech Recognition): Evaluated with CER (Character Error Rate) on zeroth-test set (457 samples).
62
+ - AST (Automatic Speech Translation): Evaluated with BLEU score on fleurs ko <-> en speech translation result (270 samples).
63
 
64
+ Script is retrieved from [here](https://gist.github.com/seastar105/d1d8983b27611370528e3b194dcc5577#file-evaluate-py).
65
 
66
+ Compared to [Phi-4-mm-inst-zeroth-kor](https://huggingface.co/seastar105/Phi-4-mm-inst-zeroth-kor) and [Phi-4-multimodal-finetune-ko-speech](https://huggingface.co/daekeun-ml/Phi-4-multimodal-finetune-ko-speech), ASR is significantly improved.
67
 
68
+ | Model | zeroth-CER | zeroth-WER | fleurs-ko2en | fleurs-ko2en-cot | fleurs-en2ko | fleurs-en2ko-cot |
69
+ |------------------------------------------------|-------------|------------|--------------|------------------|--------------|------------------|
70
+ | original | 198.32 | - | 5.63 | 2.42 | 6.86 | 4.17 |
71
+ | daekeun-ml/Phi-4-multimodal-finetune-ko-speech | 1.61 | 3.54 | 7.67 | 8.38 | 12.31 | 9.69 |
72
+ | seastar105/Phi-4-mm-inst-zeroth-kor | 7.02 | - | 7.07 | 9.19 | 13.08 | 9.35 |
73
+ | [**ASR finetune**][ASR] | **1.31** | 2.95 | 7.46 | 6.24 | 12.15 | 8.91 |
74
+ | + 1 epoch finetune with [Covost-Ko][Covost2-ko]| 3.88 | - | **8.07** | **10.09** | **18.82** | **15.41** |
75
+ | **AST finetuned model(this model)** | **1.77** | **2.99** | **8.01** | **9.09** | **17.09** | **11.82** |
76
 
77
+ [Covost2-ko]: https://huggingface.co/datasets/junnei/covost2-ko
78
+ [ASR]: https://huggingface.co/junnei/Phi-4-multimodal-instruct-ko-asr