File size: 1,088 Bytes
ee93d35 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ---
license: mit
tags:
- sparse-autoencoder
- interpretability
- llama
- cognitive-actions
---
# LLaMA-3.1-8B Cognitive Actions SAE
This is a Sparse Autoencoder (SAE) trained on layer 11 activations from LLaMA-3.1-8B-Instruct using the FAST methodology.
## Model Details
- **Base Model**: meta-llama/Llama-3.1-8B-Instruct
- **Layer**: 11
- **Dataset**: Cognitive Actions (7K examples)
- **SAE Architecture**: M=256, K=8
- **Methodology**: FAST (Finetuning-aligned Sequential Training)
## Performance
- **MSE**: 0.0065
- **Normalized MSE**: 0.0140
- **Active features/token**: 7.99
- **Dead neurons**: 0.00%
## Usage
```python
from hypothesaes.sae import load_model
sae = load_model("Koalacrown/llama3.1-8b-it-cognitive-actions-sae-l11")
features = sae.get_activations(activations)
```
## Training
Trained using [HypotheSAEs](https://github.com/DavidUdell/HypotheSAEs) with the following configuration:
- Epochs: 100
- Batch size: 512
- Learning rate: 0.0005
- Matryoshka prefixes: [64, 256]
## Citation
If you use this SAE, please cite the FAST methodology and HypotheSAEs.
|