--- language: - dv - en license: apache-2.0 base_model: nari-labs/Dia-1.6B tags: - text-to-speech - tts - audio - dhivehi - maldivian - speech-synthesis - fine-tuned library_name: dia pipeline_tag: text-to-speech datasets: - alakxender/voice-synthetic --- # Dia TTS - Dhivehi Fine-tuned Model This is a fine-tuned version of [`nari-labs/Dia-1.6B`](https://huggingface.co/nari-labs/Dia-1.6B) specifically trained for Dhivehi (Maldivian) text-to-speech synthesis. ## Model Description - **Base Model**: Dia-1.6B - **Language**: Mixed, Dhivehi (dv) - **Task**: Text-to-Speech (TTS) - **Fine-tuning**: Specialized for Dhivehi audio synthesis ## Usage ```python # Install Dia library first: # pip install git+https://github.com/nari-labs/dia.git # pip install soundfile from dia.model import Dia import soundfile as sf import torch print("🎤 Testing Dhivehi Dia TTS model...") try: # Load your fine-tuned model print("📥 Loading model from HuggingFace...") model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-18k") print("✓ Model loaded successfully!") # Test texts - Basic samples test_samples = { # Basic samples "basic_english": "Hello, this is a test.", "basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.", # Mixed language tests "mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟", # Emotional expressions and sounds "with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!", # Complex emotional scenarios "happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)", "achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)" } print("\n🗣️ Generating speech samples...") generated_files = [] for name, text in test_samples.items(): try: print(f"🎤 Generating: {name}") print(f" Text: {text[:60]}{'...' if len(text) > 60 else ''}") output = model.generate(text) filename = f"{name}.wav" sf.write(filename, output, 44100) generated_files.append((filename, len(output))) print(f" ✓ Saved: {filename} ({len(output)/44100:.2f}s)") except Exception as e: print(f" ❌ Failed to generate {name}: {e}") print(f"\n🎉 TTS generation completed!") print(f"📁 Generated {len(generated_files)} audio files:") total_duration = 0 for filename, samples in generated_files: duration = samples / 44100 total_duration += duration print(f" - {filename:<25} ({duration:.2f}s)") print(f"\n📊 Total audio generated: {total_duration:.2f} seconds") except ImportError as e: print("❌ Missing dependencies. Please install:") print(" pip install git+https://github.com/nari-labs/dia.git") print(" pip install soundfile") print(f" Error: {e}") except Exception as e: print(f"❌ Error during TTS generation: {e}") print("💡 Make sure the model was uploaded correctly and is accessible") ``` ## Training Details - **Base Model**: nari-labs/Dia-1.6B - **Training Data**: Dhivehi audio dataset - **Fine-tuning Approach**: Direct training on Dhivehi audio without language tags - **Checkpoint**: Step 18,000 ## Model Performance This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input. **Note:** This was stopped at step 18k, find the full run at [`alakxender/Dia-1.6B-dhivehi-ep1`](https://huggingface.co/alakxender/Dia-1.6B-dhivehi-ep1) ## Limitations - Optimized specifically for Dhivehi language - May not perform well on other languages - Performance depends on input text quality and pronunciation patterns ## License This model is released under the Apache 2.0 License, following the original Dia model licensing.