Skip to content

NainaniJatinZ/ScalableSAECircuits

Repository files navigation

Scalable SAE Circuits in Gemma 2 9B

TLDR: We propose a novel approach to:

  • Scaling SAE Circuits to Large Models: We find circuits in Gemma 9B by placing residual SAEs at intervals throughout the model, rather than at every layer and type.
  • Developing a Better Circuit-Finding Algorithm: Our method uses a binary masking optimization over SAE latents, which proves significantly more effective than existing methods like Attribution Patching or Integrated Gradients.

Main Masking notebook: Open In Colab

Lesswrong post link: Scaling Sparse Feature Circuit Finding to Gemma 9B

Directions to run code

  1. Download the data json files by downloading them or cloning the repo
git clone https://github.com/NainaniJatinZ/ScalableSAECircuits.git
  1. Open the ScalableSAECircuits_Colab.ipynb and follow setup instructions to train/evaluate any of the 4 tasks covered in the lesswrong post

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •