GitHub - TrescherDe/Visualizing-and-Interpreting-Neural-Network-Focus-Regions: [ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

About This Fork

This fork extends the original Transformer-MM-Explainability repository by adding an evaluation and analysis pipeline on top of their DETR-based feature visualization method.

This study compares the features learned by transformer networks on synthetic and real image data by analyzing their size, quantity, and spatial distribution using a feature visualization technique to investigate the networks' decision-making processes.

The notebook for visualizing the high attention areas ("features") extracted from the method of Transformer-MM-Explainability repository and for the occlusion approach of these features is located in:

`DETR_feature_visualization_and_occlusion.ipynb`

To access the details of these features and perform further analysis, use the notebook:

`DETR_results_feature_analysis.ipynb`

It then introduces an approach to evaluate model behavior through occlusion of these significant features. Additionally, a framework is provided for comparing the features identified on test images by models trained on synthetic data and those trained on real data, enabling an assessment of the differences between the models.

This evaluation is performed in:

`DETR_feature_and_occlusion_evaluation.ipynb`

The notebook for generating the plot for the occlusion analysis is located in:

`DETR_occlusion_analysis.ipynb`

Furthermore, the study explores the impact of enhancing the realism of synthetic images using generative artificial intelligence techniques on model performance. Specifically, it investigates whether more realistic synthetic images influence the transfer of learned features to real-world applications, with the aim of addressing the domain gap between synthetic and real-world images.

Resources

Pretrained Models

Download pretrained DETR models from:

Dropbox

Datasets

The datasets used in this work are included by cloning the following repository into a datasets/ folder within this repository:

small-load-carrier-dataset

This repository contains the datasets referenced in the paper:

Real: A dataset containing real images of small load carriers and a small storage box with material properties similar to the small load carrier.
Storage Box: The baseline synthetic dataset containing images generated using Blender and 3D meshes of small load carriers and a small storage box with similar material properties.
SD-V1: A baseline-extended dataset augmenting the baseline using Stable Diffusion.
SD-V2: A baseline-extended dataset augmenting the baseline using Stable Diffusion, focusing on photorealism.
Testvideo: A dataset containing images of the small load carrier, a small storage box with similar material properties, and other distracting objects in a real warehouse environment.

Each dataset contains 500 images, split into train/ and val/ subsets.

Associated paper: “Visualizing and Interpreting Neural Network Focus Regions: A Comparative Study of Vision Transformers on Synthetic and Real Data” (https://doi.org/10.1007/978-3-032-02813-6_22)

Citation

If you use this code, please link to this repository.

 @InProceedings{10.1007/978-3-032-02813-6_22,
author    = {Trescher, Denis and Haag, Waldemar and Schröder, Enrico},
editor    = {Braun, Tanya and Paassen, Benjamin and Stolzenburg, Frieder},
title     = {Visualizing and Interpreting Neural Network Focus Regions: A Comparative Study of Vision Transformers on Synthetic and Real Data},
booktitle = {KI 2025: Advances in Artificial Intelligence},
year      = {2026},
publisher = {Springer Nature Switzerland},
address   = {Cham},
pages     = {270--277},
isbn      = {978-3-032-02813-6},
doi       = {10.1007/978-3-032-02813-6_22}
}

Original visualization code:

@InProceedings{Chefer_2021_ICCV,
   author    = {Chefer, Hila and Gur, Shir and Wolf, Lior},
   title     = {Generic Attention-Model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers},
   booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
   month     = {October},
   year      = {2021},
   pages     = {397-406}
}

We acknowledge the original authors for their foundational work on explainability for transformer models.

This project builds on their Transformer-MM-Explainability repo, specifically the DETR visualization component.

The original project also credits:

VisualBERT implementation is based on the MMF framework
LXMERT implementation is based on the official LXMERT repo and Hugging Face Transformers
DETR implementation is based on the official DETR repo
CLIP implementation is based on the official CLIP repo
The CLIP Hugging Face Spaces demo was made by Paul Hilders, Danilo de Goede, and Piyush Bagad from the University of Amsterdam as part of their final project

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.ipynb_checkpoints		.ipynb_checkpoints
CLIP		CLIP
DETR		DETR
DETR_paper_images		DETR_paper_images
DETR_results_comparison_paper		DETR_results_comparison_paper
DETR_results_feature_analysis		DETR_results_feature_analysis
DETR_results_occlusion		DETR_results_occlusion
VisualBERT		VisualBERT
data		data
docs/images		docs/images
lxmert/lxmert		lxmert/lxmert
test_video		test_video
.gitignore		.gitignore
CLIP_explainability.ipynb		CLIP_explainability.ipynb
DETR.PNG		DETR.PNG
DETR.ipynb		DETR.ipynb
DETR_feature_analysis.ipynb		DETR_feature_analysis.ipynb
DETR_feature_and_occlusion_evaluation.ipynb		DETR_feature_and_occlusion_evaluation.ipynb
DETR_feature_visualization_and_occlusion.ipynb		DETR_feature_visualization_and_occlusion.ipynb
DETR_occlusion_analysis.ipynb		DETR_occlusion_analysis.ipynb
DETR_results_feature_analysis.ipynb		DETR_results_feature_analysis.ipynb
LICENSE		LICENSE
LXMERT-web.PNG		LXMERT-web.PNG
LXMERT.PNG		LXMERT.PNG
LXMERT.ipynb		LXMERT.ipynb
README.rst		README.rst
Transformer_MM_Explainability.ipynb		Transformer_MM_Explainability.ipynb
Transformer_MM_explainability_ViT.ipynb		Transformer_MM_explainability_ViT.ipynb
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About This Fork

Resources

Pretrained Models

Datasets

Citation

About

Uh oh!

Releases

Packages

Languages

License

TrescherDe/Visualizing-and-Interpreting-Neural-Network-Focus-Regions

Folders and files

Latest commit

History

Repository files navigation

About This Fork

Resources

Pretrained Models

Datasets

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages