Scalable SAE Circuits in Gemma 2 9B

TLDR: We propose a novel approach to:

Scaling SAE Circuits to Large Models: We find circuits in Gemma 9B by placing residual SAEs at intervals throughout the model, rather than at every layer and type.
Developing a Better Circuit-Finding Algorithm: Our method uses a binary masking optimization over SAE latents, which proves significantly more effective than existing methods like Attribution Patching or Integrated Gradients.

Main Masking notebook:

Directions to run code

git clone https://github.com/NainaniJatinZ/ScalableSAECircuits.git

Open the ScalableSAECircuits_Colab.ipynb and follow setup instructions to train/evaluate any of the 4 tasks covered in the lesswrong post

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
helpers		helpers
masks		masks
scripts		scripts
task_specific		task_specific
.gitignore		.gitignore
README.md		README.md
ScalableSAECircuits_Colab.ipynb		ScalableSAECircuits_Colab.ipynb
circuit_comparisons.ipynb		circuit_comparisons.ipynb
interp_bottlenecks.py		interp_bottlenecks.py