File size: 1,088 Bytes
ee93d35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: mit
tags:
- sparse-autoencoder
- interpretability
- llama
- cognitive-actions
---

# LLaMA-3.1-8B Cognitive Actions SAE

This is a Sparse Autoencoder (SAE) trained on layer 11 activations from LLaMA-3.1-8B-Instruct using the FAST methodology.

## Model Details

- **Base Model**: meta-llama/Llama-3.1-8B-Instruct
- **Layer**: 11
- **Dataset**: Cognitive Actions (7K examples)
- **SAE Architecture**: M=256, K=8
- **Methodology**: FAST (Finetuning-aligned Sequential Training)

## Performance

- **MSE**: 0.0065
- **Normalized MSE**: 0.0140
- **Active features/token**: 7.99
- **Dead neurons**: 0.00%

## Usage

```python
from hypothesaes.sae import load_model

sae = load_model("Koalacrown/llama3.1-8b-it-cognitive-actions-sae-l11")
features = sae.get_activations(activations)
```

## Training

Trained using [HypotheSAEs](https://github.com/DavidUdell/HypotheSAEs) with the following configuration:

- Epochs: 100
- Batch size: 512
- Learning rate: 0.0005
- Matryoshka prefixes: [64, 256]

## Citation

If you use this SAE, please cite the FAST methodology and HypotheSAEs.