Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Solshine
/
deception-sae-nanochat-d20
like
1
deception-detection
sparse-autoencoders
mechanistic-interpretability
ai-safety
nanochat
arxiv:
2503.07683
License:
mit
Model card
Files
Files and versions
xet
Community
main
deception-sae-nanochat-d20
Commit History
Initial public release: SAE weights, cfg, and model card
ba67ba2
Solshine
commited on
4 days ago