TinyStories-1M Indonesian Fine-tune (Experimental)

An experimental model fine-tuned from TinyStories-1M using Indonesian language dataset for exploration and learning purposes.

⚠️ Note: This is an experimental model for testing purposes only. Performance is far from optimal.

Model Details

Model Description

This model is a fine-tuned version of TinyStories-1M using an Indonesian language dataset. The primary purpose is for experimentation and learning, not for production use.

Model Sources

Base Model: TinyStories-1M
Training Dataset: Lyon28/Corpus-Indonesia

Performance Metrics

⚠️ Warning: This model is still in early stages and performance is not optimal.

Training Loss & Perplexity

Training loss remains quite high and perplexity shows the model has not converged well:

Rank	Train Loss	Perplexity
1	5.092371	162.775409
2	5.710950	302.158057
3	9.836301	18,700.406340
4	11.639643	113,509.674623
5	11.639969	113,546.630401

Training Details

Training Time: >3 hours
Hardware: T4 GPU
Training Regime: Full fine-tuning

Uses

Direct Use

This model can be used for:

Experimentation with Indonesian language modeling
Learning about model fine-tuning
Research and development

Out-of-Scope Use

NOT recommended for:

Production applications
Critical tasks requiring high accuracy
Professional text generation

This model is still experimental with very high perplexity, indicating poor prediction quality.

Bias, Risks, and Limitations

High Perplexity: Model shows very high perplexity (>100k on some checkpoints), indicating highly uncertain predictions
Training Loss: High loss indicates the model has not learned optimally
Experimental Status: This model was created for experimentation, not for serious applications
Data Bias: Model may inherit biases from the Lyon28/Corpus-Indonesia dataset

Recommendations

Use only for learning and experimentation purposes
Not recommended for production use
Requires further training with hyperparameter tuning for better results
Consider increasing epochs, adjusting learning rate, or using a larger dataset

Evaluation

Results

The model shows suboptimal performance:

Highest training loss: 11.639969
Lowest perplexity: 162.775409

All metrics indicate that the model requires:

Further training
Hyperparameter tuning
Possibly better architecture or more suitable dataset

Downloads last month: 407

Safetensors

Model size

3.75M params

Tensor type

F32