"π΅ Music Emotion Classification Model
This repository hosts a music emotion classification model fine-tuned from MIT/ast-finetuned-audioset-10-10-0.4593 using an iterative training and data augmentation strategy.
The model predicts emotional labels from 30-second music audio clips.
π Model Overview
Base model: MIT/ast-finetuned-audioset-10-10-0.4593
Task: Music Emotion Classification
Input: 30-second audio clips
Output: Emotion class labels
Framework: PyTorch / Hugging Face Transformers
π Dataset
The model was trained on the Music by Emotion dataset available on Hugging Face.
Source: SoundCloud
Size: 1,000 audio samples
Duration: 30 seconds per sample
Labels: Emotion categories
Split: Train / Validation / Test
π§ Training Process
Initial Fine-Tuning
The base Audio Spectrogram Transformer (AST) model was fine-tuned on the original dataset.
Initial performance achieved approximately 40% accuracy.
Iterative Training & Data Augmentation
The dataset was augmented to improve robustness and generalization.
The fine-tuned model was trained iteratively on the augmented data.
This significantly improved performance across all classes.
π Performance
Final Test Accuracy: 99%
Evaluation Metric: Classification Accuracy
Inference Speed: Optimized for real-time or batch audio classification
Note: The high accuracy reflects performance on the augmented dataset and test split.
π Usage from transformers import AutoProcessor, AutoModelForAudioClassification import torch
processor = AutoProcessor.from_pretrained("your-username/your-model-name") model = AutoModelForAudioClassification.from_pretrained("your-username/your-model-name")
inputs = processor("audio.wav", return_tensors="pt") with torch.no_grad(): outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1)
β οΈ Limitations
The model is trained on 30-second music clips and may not generalize well to:
Very short audio samples
Non-music audio (speech, noise, environmental sounds)
Emotion labels are subjective and dataset-dependent.
π License
Please refer to the original dataset and base model licenses:
Base Model: MIT/AST License
Dataset: Music by Emotion dataset license on Hugging Face
π Acknowledgements
MIT CSAIL for the AST architecture
Hugging Face for model hosting and tooling
- Downloads last month
- 47
Model tree for LaurenGurgiolo/Music_by_Emotion
Base model
MIT/ast-finetuned-audioset-10-10-0.4593