Skip to content

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

License

Notifications You must be signed in to change notification settings

TrescherDe/Visualizing-and-Interpreting-Neural-Network-Focus-Regions

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About This Fork

This fork extends the original Transformer-MM-Explainability repository by adding an evaluation and analysis pipeline on top of their DETR-based feature visualization method.

This study compares the features learned by transformer networks on synthetic and real image data by analyzing their size, quantity, and spatial distribution using a feature visualization technique to investigate the networks' decision-making processes.

The notebook for visualizing the high attention areas ("features") extracted from the method of Transformer-MM-Explainability repository and for the occlusion approach of these features is located in:

  • `DETR_feature_visualization_and_occlusion.ipynb`

To access the details of these features and perform further analysis, use the notebook:

  • `DETR_results_feature_analysis.ipynb`

It then introduces an approach to evaluate model behavior through occlusion of these significant features. Additionally, a framework is provided for comparing the features identified on test images by models trained on synthetic data and those trained on real data, enabling an assessment of the differences between the models.

This evaluation is performed in:

  • `DETR_feature_and_occlusion_evaluation.ipynb`

The notebook for generating the plot for the occlusion analysis is located in:

  • `DETR_occlusion_analysis.ipynb`

Furthermore, the study explores the impact of enhancing the realism of synthetic images using generative artificial intelligence techniques on model performance. Specifically, it investigates whether more realistic synthetic images influence the transfer of learned features to real-world applications, with the aim of addressing the domain gap between synthetic and real-world images.

Resources

Pretrained Models

Download pretrained DETR models from:

Dropbox

Datasets

The datasets used in this work are included by cloning the following repository into a datasets/ folder within this repository:

small-load-carrier-dataset

This repository contains the datasets referenced in the paper:

  • Real: A dataset containing real images of small load carriers and a small storage box with material properties similar to the small load carrier.
  • Storage Box: The baseline synthetic dataset containing images generated using Blender and 3D meshes of small load carriers and a small storage box with similar material properties.
  • SD-V1: A baseline-extended dataset augmenting the baseline using Stable Diffusion.
  • SD-V2: A baseline-extended dataset augmenting the baseline using Stable Diffusion, focusing on photorealism.
  • Testvideo: A dataset containing images of the small load carrier, a small storage box with similar material properties, and other distracting objects in a real warehouse environment.

Each dataset contains 500 images, split into train/ and val/ subsets.

Dataset visualization

Associated paper: “Visualizing and Interpreting Neural Network Focus Regions: A Comparative Study of Vision Transformers on Synthetic and Real Data” (https://doi.org/10.1007/978-3-032-02813-6_22)

Citation

If you use this code, please link to this repository.

 @InProceedings{10.1007/978-3-032-02813-6_22,
author    = {Trescher, Denis and Haag, Waldemar and Schröder, Enrico},
editor    = {Braun, Tanya and Paassen, Benjamin and Stolzenburg, Frieder},
title     = {Visualizing and Interpreting Neural Network Focus Regions: A Comparative Study of Vision Transformers on Synthetic and Real Data},
booktitle = {KI 2025: Advances in Artificial Intelligence},
year      = {2026},
publisher = {Springer Nature Switzerland},
address   = {Cham},
pages     = {270--277},
isbn      = {978-3-032-02813-6},
doi       = {10.1007/978-3-032-02813-6_22}
}

Original visualization code:

@InProceedings{Chefer_2021_ICCV,
   author    = {Chefer, Hila and Gur, Shir and Wolf, Lior},
   title     = {Generic Attention-Model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers},
   booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
   month     = {October},
   year      = {2021},
   pages     = {397-406}
}

We acknowledge the original authors for their foundational work on explainability for transformer models.

This project builds on their Transformer-MM-Explainability repo, specifically the DETR visualization component.

The original project also credits:

About

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.5%
  • Python 7.5%
  • JavaScript 0.0%
  • Shell 0.0%
  • C 0.0%
  • CSS 0.0%