TinyStories-1M Indonesian Fine-tune (Experimental)

An experimental model fine-tuned from TinyStories-1M using Indonesian language dataset for exploration and learning purposes.

⚠️ Note: This is an experimental model for testing purposes only. Performance is far from optimal.

Model Details

Model Description

This model is a fine-tuned version of TinyStories-1M using an Indonesian language dataset. The primary purpose is for experimentation and learning, not for production use.

Model Sources

Performance Metrics

⚠️ Warning: This model is still in early stages and performance is not optimal.

Training Loss & Perplexity

Training loss remains quite high and perplexity shows the model has not converged well:

Rank Train Loss Perplexity
1 5.092371 162.775409
2 5.710950 302.158057
3 9.836301 18,700.406340
4 11.639643 113,509.674623
5 11.639969 113,546.630401

Training Details

  • Training Time: >3 hours
  • Hardware: T4 GPU
  • Training Regime: Full fine-tuning

Uses

Direct Use

This model can be used for:

  • Experimentation with Indonesian language modeling
  • Learning about model fine-tuning
  • Research and development

Out-of-Scope Use

NOT recommended for:

  • Production applications
  • Critical tasks requiring high accuracy
  • Professional text generation

This model is still experimental with very high perplexity, indicating poor prediction quality.

Bias, Risks, and Limitations

  • High Perplexity: Model shows very high perplexity (>100k on some checkpoints), indicating highly uncertain predictions
  • Training Loss: High loss indicates the model has not learned optimally
  • Experimental Status: This model was created for experimentation, not for serious applications
  • Data Bias: Model may inherit biases from the Lyon28/Corpus-Indonesia dataset

Recommendations

  • Use only for learning and experimentation purposes
  • Not recommended for production use
  • Requires further training with hyperparameter tuning for better results
  • Consider increasing epochs, adjusting learning rate, or using a larger dataset

Evaluation

Results

The model shows suboptimal performance:

  • Highest training loss: 11.639969
  • Lowest perplexity: 162.775409

All metrics indicate that the model requires:

  • Further training
  • Hyperparameter tuning
  • Possibly better architecture or more suitable dataset
Downloads last month
407
Safetensors
Model size
3.75M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support