"🎵 Music Emotion Classification Model

This repository hosts a music emotion classification model fine-tuned from MIT/ast-finetuned-audioset-10-10-0.4593 using an iterative training and data augmentation strategy.

The model predicts emotional labels from 30-second music audio clips.

📌 Model Overview

Base model: MIT/ast-finetuned-audioset-10-10-0.4593

Task: Music Emotion Classification

Input: 30-second audio clips

Output: Emotion class labels

Framework: PyTorch / Hugging Face Transformers

📂 Dataset

The model was trained on the Music by Emotion dataset available on Hugging Face.

Source: SoundCloud

Size: 1,000 audio samples

Duration: 30 seconds per sample

Labels: Emotion categories

Split: Train / Validation / Test

🧠 Training Process

Initial Fine-Tuning

The base Audio Spectrogram Transformer (AST) model was fine-tuned on the original dataset.

Initial performance achieved approximately 40% accuracy.

Iterative Training & Data Augmentation

The dataset was augmented to improve robustness and generalization.

The fine-tuned model was trained iteratively on the augmented data.

This significantly improved performance across all classes.

📊 Performance

Final Test Accuracy: 99%

Evaluation Metric: Classification Accuracy

Inference Speed: Optimized for real-time or batch audio classification

Note: The high accuracy reflects performance on the augmented dataset and test split.

🚀 Usage from transformers import AutoProcessor, AutoModelForAudioClassification import torch

processor = AutoProcessor.from_pretrained("your-username/your-model-name") model = AutoModelForAudioClassification.from_pretrained("your-username/your-model-name")

inputs = processor("audio.wav", return_tensors="pt") with torch.no_grad(): outputs = model(**inputs)

predicted_class = outputs.logits.argmax(dim=-1)

⚠️ Limitations

The model is trained on 30-second music clips and may not generalize well to:

Very short audio samples

Non-music audio (speech, noise, environmental sounds)

Emotion labels are subjective and dataset-dependent.

📜 License

Please refer to the original dataset and base model licenses:

Base Model: MIT/AST License

Dataset: Music by Emotion dataset license on Hugging Face

🙌 Acknowledgements

MIT CSAIL for the AST architecture

Hugging Face for model hosting and tooling

SoundCloud contributors for the audio data"

Downloads last month: 47

Safetensors

Model size

86.2M params

Tensor type

F32

Model tree for LaurenGurgiolo/Music_by_Emotion

Base model

MIT/ast-finetuned-audioset-10-10-0.4593

Finetuned

(174)

this model

LaurenGurgiolo
/

Music_by_Emotion

Model tree for LaurenGurgiolo/Music_by_Emotion

Dataset used to train LaurenGurgiolo/Music_by_Emotion