--- license: mit tags: - sparse-autoencoder - interpretability - llama - cognitive-actions --- # LLaMA-3.1-8B Cognitive Actions SAE This is a Sparse Autoencoder (SAE) trained on layer 11 activations from LLaMA-3.1-8B-Instruct using the FAST methodology. ## Model Details - **Base Model**: meta-llama/Llama-3.1-8B-Instruct - **Layer**: 11 - **Dataset**: Cognitive Actions (7K examples) - **SAE Architecture**: M=256, K=8 - **Methodology**: FAST (Finetuning-aligned Sequential Training) ## Performance - **MSE**: 0.0065 - **Normalized MSE**: 0.0140 - **Active features/token**: 7.99 - **Dead neurons**: 0.00% ## Usage ```python from hypothesaes.sae import load_model sae = load_model("Koalacrown/llama3.1-8b-it-cognitive-actions-sae-l11") features = sae.get_activations(activations) ``` ## Training Trained using [HypotheSAEs](https://github.com/DavidUdell/HypotheSAEs) with the following configuration: - Epochs: 100 - Batch size: 512 - Learning rate: 0.0005 - Matryoshka prefixes: [64, 256] ## Citation If you use this SAE, please cite the FAST methodology and HypotheSAEs.