Skip to content

zhangnn520/Paper-AnyAnomaly

 
 

Repository files navigation

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

arXiv hf Colab1 Colab2

This repository is the official open-source of AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM by Sunghyun Ahn*, Youngwan Jo*, Kijung Lee, Sein Kwon, Inpyo Hong and Sanghyun Park. (*equally contributed)

Description

Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. Consequently, users should retrain models or develop separate AI models for new environments, which requires expertise in machine learning, high-performance hardware, and extensive data collection, limiting the practical usability of VAD. To address these challenges, this study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers user-defined text as an abnormal event and detects frames containing a specified event in a video. We effectively implemented AnyAnomaly using a context-aware visual question answering without fine-tuning the large vision language model. To validate the effectiveness of the proposed model, we constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly. Furthermore, our approach showed competitive performance on VAD benchmark datasets, achieving state-of-the-art results on the UBnormal dataset and outperforming other methods in generalization across all datasets.

fig-1

Context-aware VQA

Comparison of the proposed model with the baseline. Both models perform C-VAD, but the baseline operates with frame-level VQA, whereas the proposed model employs a segment-level Context-Aware VQA. Context-Aware VQA is a method that performs VQA by utilizing additional contexts that describe an image. To enhance the object analysis and action understanding capabilities of LVLM, we propose Position Context and Temporal Context.

fig-2

Results

Table 1 and Table 2 present the evaluation results on the C-VAD datasets (C-ShT, C-Ave). The proposed model achieved performance improvements of 9.88% and 13.65% over the baseline on the C-ShT and C-Ave datasets, respectively. Specifically, it showed improvements of 14.34% and 8.2% in the action class, and 3.25% and 21.98% in the appearance class.

fig-3
fig-3

Qualitative Evaluation

  • Anomaly Detection in Diverse scenarios
Text Demo
Jumping-Falling
-Pickup
c5-2
Bicycle-
Running
c6-2
Bicycle-
Stroller
c7
  • Anomaly Detection in Complex scenarios
Text Demo
Driving outside
lane
c4
People and car
accident
c1
Jaywalking c2
Walking
drunk
c3

Datasets

  • We processed the Shanghai Tech Campus (ShT) and CUHK Avenue (Ave) datasets to create the labels for the C-ShT and C-Ave datasets. These labels can be found in the ground_truth folder. To test the C-ShT and C-Ave datasets, you need to first download the ShT and Ave datasets and store them in the directory corresponding to 'data_root'.
  • You can specify the dataset's path by editing 'data_root' in config.py.
CUHK Avenue Shnaghai Tech. Quick Download
Official Site Official Site GitHub Page

1. Requirements and Installation For Chat-UniVi

  • Once the datasets and the Chat-UniVi model are ready, you can move the provided tutorial files to the main directory and run them directly!
  • Chat-UniVi: [GitHub]
  • weights: Chat-UniVi 7B [Huggingface], Chat-UniVi 13B [Huggingface]
  • Install required packages:
git clone https://github.com/PKU-YuanGroup/Chat-UniVi
cd Chat-UniVi
conda create -n chatunivi python=3.10 -y
conda activate chatunivi
pip install --upgrade pip
pip install -e .
pip install numpy==1.24.3

# Download the Model (Chat-UniVi 7B)
mkdir weights
cd weights
sudo apt-get install git-lfs
git lfs install
git lfs clone https://huggingface.co/Chat-UniVi/Chat-UniVi

# Download extra packages
cd ../../
pip install -r requirements.txt

Command

  • C-Ave type: [too_close, bicycle, throwing, running, dancing]
  • C-ShT type: [car, bicycle, fighting, throwing, hand_truck, running, skateboarding, falling, jumping, loitering, motorcycle]
  • C-Ave type (multiple): [throwing-too_close, running-throwing]
  • C-ShT type (multiple): [stroller-running, stroller-loitering, stroller-bicycle, skateboarding-bicycle, running-skateboarding, running-jumping, running-bicycle, jumping-falling-pickup, car-bicycle]
# Baseline model (Chat-UniVi) → C-ShT
python -u vad_chatunivi.py --dataset=shtech --type=falling
# proposed model (AnyAomaly) → C-ShT
python -u vad_proposed_chatunivi.py --dataset=shtech --type=falling
# proposed model (AnyAnomaly) → C-ShT, diverse anomaly scenarios
python -u vad_proposed_chatunivi.py --dataset=shtech --multiple=True --type=jumping-falling-pickup

2. Requirements and Installation For MiniCPM-V

  • MiniCPM-V: [GitHub]
  • Install required packages:
git clone https://github.com/OpenBMB/MiniCPM-V.git
cd MiniCPM-V
conda create -n MiniCPM-V python=3.10 -y
conda activate MiniCPM-V
pip install -r requirements.txt

# Download extra packages
cd ../
pip install -r requirements.txt

Command

# Baseline model (MiniCPM-V) → C-ShT
python -u vad_MiniCPM.py --dataset=shtech --type=falling 
# proposed model (AnyAomaly) → C-ShT
python -u vad_proposed_MiniCPM.py --dataset=shtech --type=falling 
# proposed model (AnyAnomaly) → C-ShT, diverse anomaly scenarios
python -u vad_proposed_MiniCPM.py --dataset=shtech --multiple=True --type=jumping-falling-pickup

Citation

If you use our work, please consider citing:

@article{ahn2025anyanomaly,
  title={AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM},
  author={Ahn, Sunghyun and Jo, Youngwan and Lee, Kijung and Kwon, Sein and Hong, Inpyo and Park, Sanghyun},
  journal={arXiv preprint arXiv:2503.04504},
  year={2025}
}

Contact

Should you have any question, please create an issue on this repository or contact me at [email protected].

About

PyTorch Implementation of the Paper 'AnyAnomaly': Official Version

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%