TTS-AGI
/

audio-audio-clap-maestrino-sae-32x-k5

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Audio-Audio CLAP SAE (32x expansion, k=5)

Sparse Autoencoder trained on 155M CLAP embeddings from 6 audio datasets.

Architecture

Input dimension: 512
Hidden dimension: 16384 (32x expansion)
Top-k: 5
Alive features: 14,127 / 16,384
Dead features: 2,257

Training Data

MLS, CommonVoice, AudioSnippets, Maestrino, Emolia, Podcast
Total samples: 86,729,608

Usage

from sae import SparseAutoencoder
sae = SparseAutoencoder.load_from_disk("path/to/model")
latents = sae.encode(embeddings)  # (batch, 512) -> (batch, 16384)

Stats

Total feature firings: 433,648,040
Mean firings per sample: 5.0

Downloads last month: 29

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support