KasunUoM commited on
Commit
7fc5305
·
verified ·
1 Parent(s): 47911d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -3
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: mpl-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mpl-2.0
3
+ language:
4
+ - si
5
+ tags:
6
+ - si
7
+ - lk
8
+ - dialog
9
+ - male
10
+ - tts
11
+ - uom
12
+ - vits
13
+ ---
14
+
15
+ # SinhalaVITS-TTS-M2
16
+ This is a fine-tuned Coqui TTS [Coqui TTS](https://github.com/coqui-ai/TTS) model specially for **Sinhala**, developed by **Dialog Axiata PLC** and the **Dialog – UoM Research Lab**.
17
+
18
+ We fine-tuned it on a custom recorded dataset adapting a strong male voice.
19
+
20
+ ---
21
+ ## Features
22
+ - Model architecture: VITS
23
+ - Language: Sinhala (si-lk)
24
+ - Training Sampling rate: 22050 Hz
25
+ - Framework: Coqui TTS
26
+ ---
27
+
28
+ ## Dataset
29
+ - Voice: Male (Sanjaya)
30
+ - Recording Sampling Rate: 44100Hz
31
+ - No. of Clips: 1096
32
+ - Total Length: >100mins (~2 hrs.)
33
+
34
+ ## Training Specs
35
+ - Hardware: NVidia GeForce GTX1060 6GB GPU
36
+ - Training Time: **~85 hours**
37
+ - Global Steps: 170,000
38
+ - Batch Size: 16
39
+ - Epochs:
40
+ - Loss Convergence: Stable mel + KL losses
41
+
42
+
43
+ ## Installation
44
+
45
+ You can run this model locally using the included Flask-based inference server. This server will automatically use CUDA if it's available on your system.
46
+
47
+ 1. First install requirements.
48
+
49
+ ```bash
50
+ pip install -r requirements.txt
51
+ ```
52
+ 2. Then start the API server
53
+
54
+ ```bash
55
+ python inference_M1.py
56
+ ```
57
+ _This starts a Flask server at http://localhost:8000._
58
+
59
+ 3. Then you can use curl or any HTTP client (like Postman) to send Sinhala text to the server.
60
+ The API endpoint is '/tts'
61
+ ```bash
62
+ curl -X POST http://localhost:8000/tts \
63
+ -H "Content-Type: application/json" \
64
+ -d '{"text": "ආයුබෝවන්. සිංහල ටෙක්ස්ට් එකක් දාලා බලමුද?"}' \
65
+ --output output.wav
66
+ ```
67
+ 4. This API will,
68
+ * Convert Sinhala text → Romanized Sinhala (via romanizer.py)
69
+ * Generate speech using the VITS model
70
+ * Return output.wav (Sinhala voice)
71
+
72
+ ## File Structure
73
+ ```bash
74
+ SinhalaVITS-TTS-M2/
75
+ ├── Sanjaya_170000.pth # Fine-tuned VITS checkpoint
76
+ ├── Sanjaya_config.json # Model configuration
77
+ ├── romanizer.py # Sinhala → Roman converter
78
+ ├── inference_M1.py # Flask-based inference server
79
+ ├── requirements.txt # Required dependencies
80
+ ├── LICENSE # MPL-2.0 license
81
+ └── README.md # This file
82
+ ```
83
+ ## Contributors
84
+
85
+ * Kasun Ranasinghe (Dialog-UoM Reasearch Lab)
86
+ * Randika Silva (Dialog Axiata PLC)
87
+ * Vipula Wakkumbura (Dialog-UoM Reasearch Lab)
88
+
89
+ ## Acknowledgements
90
+ * PathNirvana (https://github.com/pathnirvana/coqui-tts) – Original Sinhala male VITS checkpoint and online Romanization toolkit
91
+ * Coqui TTS – Open-source TTS framework enabling the foundation of this work
92
+ * Sinhala dataset contributor (Sanjaya Nirodh) – for providing professional, quality speech samples
93
+
94
+ ## License
95
+ This model is released under the MPL-2.0 license, the same as the original Sinhala TTS checkpoint by Pathnirvana.