Skip to content

kevinlu4588/WhenAreConceptsErased

Repository files navigation

When Are Concepts Erased From Diffusion Models? (NeurIPS 2025)

Project website | Paper on arXiv | Finetuned model and classifier weights

Figure 1

Overview

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model.

We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely.

To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models.

Environment Setup

Create and activate the provided Conda environment:

git clone https://github.com/kevinlu4588/WhenAreConceptsErased.git
cd WhenAreConceptsErased
pip install -r requirements.txt

Running the Demo

Navigate to the src directory and run the demo script:

cd src
python demo.py

This will:

  1. Run all available probes on the configured model(s)
  2. Save generated images under data/results/
  3. Automatically compute evaluation metrics (CLIP similarity and classification accuracy)

Running Probes on Your Model

To run the probes on your own model:

cd src
python runner.py --concept <your_concept> --pipeline_path <path_to_your_model>

For example:

python runner.py --concept airliner --pipeline_path DiffusionConceptErasure/esdx_airliner

This will run all probes by default. You can also specify individual probes:

python runner.py --concept airliner --pipeline_path <model_path> --probes standardpromptprobe noisebasedprobe

Key Notebooks

We provide several Jupyter notebooks that demonstrate our probing techniques and evaluation pipeline:

📊 Core Probe Implementations

  • Noise-based Probing: Walkthrough showing how we manipulate diffusion trajectories to reveal latent concept knowledge in erased models

  • Classifier Guidance: Demonstration of applying classifier guidance to steer erased models back toward generating the target concept

📈 Results & Evaluation

  • Demo Results Visualization: Visualization of probe demo results, including CLIP similarity scores, classification accuracies, and side-by-side comparisons across different erasure methods.

Training new latent classifiers

Quick start:

cd classifier_guidance

python e2e_concept_classifier.py "church, church building" "airliner"\
  --epochs 70 --batch-size 8 --output-dir "./my_classifiers"

Probe Execution Times for Demo

Running the probes on an NVIDIA A6000 GPU, typical execution times for a single concept/model pair are:

Probe Time per Image Total Time (30 prompts)
Standard Prompt 2 seconds 1 minute
Inpainting 2 seconds 1 minute
Diffusion Completion 2 seconds 1 minute
Noise-based 2 seconds × 24 samples 24 minutes
Classifier Guidance 2 seconds × 24 samples 24 minutes
Noise-based + Classifier 2 seconds × 24 samples 24 minutes
Textual Inversion - 60 minute (training time per concept model pair)

📖 Citation

If you find this work useful in your research, please consider citing:

@inproceedings{lu2025concepts,
  title={When Are Concepts Erased From Diffusion Models?},
  author={Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, and Niv Cohen},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2025}
}

🔗 Related Work

Our work builds upon a growing body of research on concept erasure and targeted model editing, including

We thank the authors of these methods for laying the groundwork for this research.


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

For questions about the code or paper, please open an issue or contact [[email protected]].


About

When are Concepts Erased from Diffusion Models? (NeurIPS 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published