YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Highly experimental proper MoE

!! THE GATE IS NOT ALIGNED FULLY !!

Based off of smollmv2. (Llama) MoE-ified then further trained on a general dataset.

info:

MoE layers: [8, 12, 16, 20, 24, 28] 
Top-k: 2 (activates 50.0% of experts per token) 
Hidden size: 960 
Total parameters: 494,554,560 
Trainable parameters: 494,554,560 
Auxiliary loss weight: 0.01

training loss: Total Loss = 6.4659, LM Loss = 5.9851, Aux Loss = 48.0835

val loss: Total Loss: 0.8298, LM Loss: 0.7697, Aux Loss: 6.0092

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aldigobbler/smollmv2-360Mx4E-MoE-v0.1-unaligned-gates