dialoglk
/

SinhalaVITS-TTS-M1

Model card Files Files and versions

KasunUoM commited on Nov 13, 2025

Commit

7fc5305

·

verified ·

1 Parent(s): 47911d2

Update README.md

Files changed (1) hide show

README.md +95 -3

README.md CHANGED Viewed

@@ -1,3 +1,95 @@
----
-license: mpl-2.0
----

+---
+license: mpl-2.0
+language:
+- si
+tags:
+- si
+- lk
+- dialog
+- male
+- tts
+- uom
+- vits
+---
+# SinhalaVITS-TTS-M2
+This is a fine-tuned Coqui TTS [Coqui TTS](https://github.com/coqui-ai/TTS) model specially for **Sinhala**, developed by **Dialog Axiata PLC** and the **Dialog – UoM Research Lab**.
+We fine-tuned it on a custom recorded dataset adapting a strong male voice.
+---
+## Features
+- Model architecture: VITS
+- Language: Sinhala (si-lk)
+- Training Sampling rate: 22050 Hz
+- Framework: Coqui TTS
+---
+## Dataset
+- Voice: Male (Sanjaya)
+- Recording Sampling Rate: 44100Hz
+- No. of Clips: 1096
+- Total Length: >100mins (~2 hrs.)
+## Training Specs
+- Hardware: NVidia GeForce GTX1060 6GB GPU
+- Training Time: **~85 hours**
+- Global Steps: 170,000
+- Batch Size: 16
+- Epochs:
+- Loss Convergence: Stable mel + KL losses
+## Installation
+You can run this model locally using the included Flask-based inference server. This server will automatically use CUDA if it's available on your system.
+1. First install requirements.
+   ```bash
+    pip install -r requirements.txt
+   ```
+2. Then start the API server
+  ```bash
+    python inference_M1.py
+  ```
+  _This starts a Flask server at http://localhost:8000._
+3. Then you can use curl or any HTTP client (like Postman) to send Sinhala text to the server.
+   The API endpoint is '/tts'
+  ```bash
+    curl -X POST http://localhost:8000/tts \
+       -H "Content-Type: application/json" \
+       -d '{"text": "ආයුබෝවන්. සිංහල ටෙක්ස්ට් එකක් දාලා බලමුද?"}' \
+       --output output.wav
+  ```
+4. This API will,
+    * Convert Sinhala text → Romanized Sinhala (via romanizer.py)
+    * Generate speech using the VITS model
+    * Return output.wav (Sinhala voice)
+## File Structure
+  ```bash
+    SinhalaVITS-TTS-M2/
+      ├── Sanjaya_170000.pth          # Fine-tuned VITS checkpoint
+      ├── Sanjaya_config.json         # Model configuration
+      ├── romanizer.py                # Sinhala → Roman converter
+      ├── inference_M1.py             # Flask-based inference server
+      ├── requirements.txt            # Required dependencies
+      ├── LICENSE                     # MPL-2.0 license
+      └── README.md                   # This file
+  ```
+## Contributors
+  * Kasun Ranasinghe (Dialog-UoM Reasearch Lab)
+  * Randika Silva (Dialog Axiata PLC)
+  * Vipula Wakkumbura (Dialog-UoM Reasearch Lab)
+## Acknowledgements
+  * PathNirvana (https://github.com/pathnirvana/coqui-tts) – Original Sinhala male VITS checkpoint and online Romanization toolkit
+  * Coqui TTS – Open-source TTS framework enabling the foundation of this work
+  * Sinhala dataset contributor (Sanjaya Nirodh) – for providing professional, quality speech samples
+## License
+This model is released under the MPL-2.0 license, the same as the original Sinhala TTS checkpoint by Pathnirvana.