When Are Concepts Erased From Diffusion Models? (NeurIPS 2025)

Project website | Paper on arXiv | Finetuned model and classifier weights

Overview

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model.

We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely.

To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models.

Environment Setup

Create and activate the provided Conda environment:

git clone https://github.com/kevinlu4588/WhenAreConceptsErased.git
cd WhenAreConceptsErased
pip install -r requirements.txt

Running the Demo

Navigate to the src directory and run the demo script:

cd src
python demo.py

This will:

Run all available probes on the configured model(s)
Save generated images under data/results/
Automatically compute evaluation metrics (CLIP similarity and classification accuracy)

Running Probes on Your Model

To run the probes on your own model:

cd src
python runner.py --concept <your_concept> --pipeline_path <path_to_your_model>

For example:

python runner.py --concept airliner --pipeline_path DiffusionConceptErasure/esdx_airliner

This will run all probes by default. You can also specify individual probes:

python runner.py --concept airliner --pipeline_path <model_path> --probes standardpromptprobe noisebasedprobe

Key Notebooks

We provide several Jupyter notebooks that demonstrate our probing techniques and evaluation pipeline:

📊 Core Probe Implementations

Noise-based Probing: Walkthrough showing how we manipulate diffusion trajectories to reveal latent concept knowledge in erased models
Classifier Guidance: Demonstration of applying classifier guidance to steer erased models back toward generating the target concept

📈 Results & Evaluation

Demo Results Visualization: Visualization of probe demo results, including CLIP similarity scores, classification accuracies, and side-by-side comparisons across different erasure methods.

Training new latent classifiers

Quick start:

cd classifier_guidance

python e2e_concept_classifier.py "church, church building" "airliner"\
  --epochs 70 --batch-size 8 --output-dir "./my_classifiers"

Probe Execution Times for Demo

Running the probes on an NVIDIA A6000 GPU, typical execution times for a single concept/model pair are:

Probe	Time per Image	Total Time (30 prompts)
Standard Prompt	2 seconds	1 minute
Inpainting	2 seconds	1 minute
Diffusion Completion	2 seconds	1 minute
Noise-based	2 seconds × 24 samples	24 minutes
Classifier Guidance	2 seconds × 24 samples	24 minutes
Noise-based + Classifier	2 seconds × 24 samples	24 minutes
Textual Inversion	-	60 minute (training time per concept model pair)

📖 Citation

If you find this work useful in your research, please consider citing:

@inproceedings{lu2025concepts,
  title={When Are Concepts Erased From Diffusion Models?},
  author={Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, and Niv Cohen},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2025}
}

🔗 Related Work

Our work builds upon a growing body of research on concept erasure and targeted model editing, including

Erased Stable Diffusion (ESD) — model finetuning for concept removal
Universal Concept Editing (UCE) — lightweight cross attention projection
TaskVectors — linear task steering in model weight space
STEREO — ESD + Textual Inversion loop
RECE — UCE + additional embedding projection
UnlearnDiffAtk — adversarial prompt optimization

We thank the authors of these methods for laying the groundwork for this research.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

For questions about the code or paper, please open an issue or contact [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
classifier_guidance		classifier_guidance
data/prompts		data/prompts
images		images
probe_notebooks		probe_notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

When Are Concepts Erased From Diffusion Models? (NeurIPS 2025)

Overview

Environment Setup

Running the Demo

Running Probes on Your Model

Key Notebooks

📊 Core Probe Implementations

📈 Results & Evaluation

Training new latent classifiers

Probe Execution Times for Demo

📖 Citation

🔗 Related Work

📄 License

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

kevinlu4588/WhenAreConceptsErased

Folders and files

Latest commit

History

Repository files navigation

When Are Concepts Erased From Diffusion Models? (NeurIPS 2025)

Overview

Environment Setup

Running the Demo

Running Probes on Your Model

Key Notebooks

📊 Core Probe Implementations

📈 Results & Evaluation

Training new latent classifiers

Probe Execution Times for Demo

📖 Citation

🔗 Related Work

📄 License

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages