---
language:
- dv
- en
license: apache-2.0
base_model: nari-labs/Dia-1.6B
tags:
- text-to-speech
- tts
- audio
- dhivehi
- maldivian
- speech-synthesis
- fine-tuned
library_name: dia
pipeline_tag: text-to-speech
datasets:
- alakxender/voice-synthetic
---

# Dia TTS - Dhivehi Fine-tuned Model

This is a fine-tuned version of [`nari-labs/Dia-1.6B`](https://huggingface.co/nari-labs/Dia-1.6B) specifically trained for Dhivehi (Maldivian) text-to-speech synthesis.

## Model Description

- **Base Model**: Dia-1.6B 
- **Language**: Mixed, Dhivehi (dv)
- **Task**: Text-to-Speech (TTS)
- **Fine-tuning**: Specialized for Dhivehi audio synthesis

## Usage

```python
# Install Dia library first:
# pip install git+https://github.com/nari-labs/dia.git
# pip install soundfile

from dia.model import Dia
import soundfile as sf
import torch

print("🎤 Testing Dhivehi Dia TTS model...")

try:
    # Load your fine-tuned model
    print("📥 Loading model from HuggingFace...")
    model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-18k")
    print("✓ Model loaded successfully!")
    
    # Test texts - Basic samples
    test_samples = {
        # Basic samples
        "basic_english": "Hello, this is a test.",
        "basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.",
        
        # Mixed language tests
        "mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟",

        # Emotional expressions and sounds
        "with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!",
        
        # Complex emotional scenarios
        "happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)",
        "achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)"
    }
    
    print("\n🗣️  Generating speech samples...")
    generated_files = []
    
    for name, text in test_samples.items():
        try:
            print(f"🎤 Generating: {name}")
            print(f"   Text: {text[:60]}{'...' if len(text) > 60 else ''}")
            
            output = model.generate(text)
            filename = f"{name}.wav"
            sf.write(filename, output, 44100)
            generated_files.append((filename, len(output)))
            print(f"   ✓ Saved: {filename} ({len(output)/44100:.2f}s)")
            
        except Exception as e:
            print(f"   ❌ Failed to generate {name}: {e}")
    
    print(f"\n🎉 TTS generation completed!")
    print(f"📁 Generated {len(generated_files)} audio files:")
    
    total_duration = 0
    for filename, samples in generated_files:
        duration = samples / 44100
        total_duration += duration
        print(f"   - {filename:<25} ({duration:.2f}s)")
    
    print(f"\n📊 Total audio generated: {total_duration:.2f} seconds")
    
except ImportError as e:
    print("❌ Missing dependencies. Please install:")
    print("   pip install git+https://github.com/nari-labs/dia.git")
    print("   pip install soundfile")
    print(f"   Error: {e}")
    
except Exception as e:
    print(f"❌ Error during TTS generation: {e}")
    print("💡 Make sure the model was uploaded correctly and is accessible")
```

## Training Details

- **Base Model**: nari-labs/Dia-1.6B
- **Training Data**: Dhivehi audio dataset
- **Fine-tuning Approach**: Direct training on Dhivehi audio without language tags
- **Checkpoint**: Step 18,000

## Model Performance

This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input. 

**Note:** This was stopped at step 18k, find the full run at [`alakxender/Dia-1.6B-dhivehi-ep1`](https://huggingface.co/alakxender/Dia-1.6B-dhivehi-ep1) 

## Limitations

- Optimized specifically for Dhivehi language
- May not perform well on other languages
- Performance depends on input text quality and pronunciation patterns

## License

This model is released under the Apache 2.0 License, following the original Dia model licensing.