This repository is the official open-source of AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM by Sunghyun Ahn*, Youngwan Jo*, Kijung Lee, Sein Kwon, Inpyo Hong and Sanghyun Park. (*equally contributed)
Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. Consequently, users should retrain models or develop separate AI models for new environments, which requires expertise in machine learning, high-performance hardware, and extensive data collection, limiting the practical usability of VAD. To address these challenges, this study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers user-defined text as an abnormal event and detects frames containing a specified event in a video. We effectively implemented AnyAnomaly using a context-aware visual question answering without fine-tuning the large vision language model. To validate the effectiveness of the proposed model, we constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly. Furthermore, our approach showed competitive performance on VAD benchmark datasets, achieving state-of-the-art results on the UBnormal dataset and outperforming other methods in generalization across all datasets.

Comparison of the proposed model with the baseline. Both models perform C-VAD, but the baseline operates with frame-level VQA, whereas the proposed model employs a segment-level Context-Aware VQA. Context-Aware VQA is a method that performs VQA by utilizing additional contexts that describe an image. To enhance the object analysis and action understanding capabilities of LVLM, we propose Position Context and Temporal Context.
- Position Context Tutorial: [Google Colab]
- Temporal Context Tutorial: [Google Colab]
Table 1 and Table 2 present the evaluation results on the C-VAD datasets (C-ShT, C-Ave). The proposed model achieved performance improvements of 9.88% and 13.65% over the baseline on the C-ShT and C-Ave datasets, respectively. Specifically, it showed improvements of 14.34% and 8.2% in the action class, and 3.25% and 21.98% in the appearance class.


- Anomaly Detection in Diverse scenarios
| Text | Demo |
|---|---|
| Jumping-Falling -Pickup |
![]() |
| Bicycle- Running |
![]() |
| Bicycle- Stroller |
![]() |
- Anomaly Detection in Complex scenarios
| Text | Demo |
|---|---|
| Driving outside lane |
![]() |
| People and car accident |
![]() |
| Jaywalking | ![]() |
| Walking drunk |
![]() |
- We processed the Shanghai Tech Campus (ShT) and CUHK Avenue (Ave) datasets to create the labels for the C-ShT and C-Ave datasets. These labels can be found in the
ground_truthfolder. To test the C-ShT and C-Ave datasets, you need to first download the ShT and Ave datasets and store them in the directory corresponding to'data_root'. - You can specify the dataset's path by editing
'data_root'inconfig.py.
| CUHK Avenue | Shnaghai Tech. | Quick Download |
|---|---|---|
| Official Site | Official Site | GitHub Page |
- Once the datasets and the Chat-UniVi model are ready, you can move the provided
tutorial filesto the main directory and run them directly! Chat-UniVi: [GitHub]- weights: Chat-UniVi 7B [Huggingface], Chat-UniVi 13B [Huggingface]
- Install required packages:
git clone https://github.com/PKU-YuanGroup/Chat-UniVi
cd Chat-UniVi
conda create -n chatunivi python=3.10 -y
conda activate chatunivi
pip install --upgrade pip
pip install -e .
pip install numpy==1.24.3
# Download the Model (Chat-UniVi 7B)
mkdir weights
cd weights
sudo apt-get install git-lfs
git lfs install
git lfs clone https://huggingface.co/Chat-UniVi/Chat-UniVi
# Download extra packages
cd ../../
pip install -r requirements.txtC-Ave type: [too_close, bicycle, throwing, running, dancing]C-ShT type: [car, bicycle, fighting, throwing, hand_truck, running, skateboarding, falling, jumping, loitering, motorcycle]C-Ave type (multiple): [throwing-too_close, running-throwing]C-ShT type (multiple): [stroller-running, stroller-loitering, stroller-bicycle, skateboarding-bicycle, running-skateboarding, running-jumping, running-bicycle, jumping-falling-pickup, car-bicycle]
# Baseline model (Chat-UniVi) → C-ShT
python -u vad_chatunivi.py --dataset=shtech --type=falling
# proposed model (AnyAomaly) → C-ShT
python -u vad_proposed_chatunivi.py --dataset=shtech --type=falling
# proposed model (AnyAnomaly) → C-ShT, diverse anomaly scenarios
python -u vad_proposed_chatunivi.py --dataset=shtech --multiple=True --type=jumping-falling-pickupMiniCPM-V: [GitHub]- Install required packages:
git clone https://github.com/OpenBMB/MiniCPM-V.git
cd MiniCPM-V
conda create -n MiniCPM-V python=3.10 -y
conda activate MiniCPM-V
pip install -r requirements.txt
# Download extra packages
cd ../
pip install -r requirements.txt# Baseline model (MiniCPM-V) → C-ShT
python -u vad_MiniCPM.py --dataset=shtech --type=falling
# proposed model (AnyAomaly) → C-ShT
python -u vad_proposed_MiniCPM.py --dataset=shtech --type=falling
# proposed model (AnyAnomaly) → C-ShT, diverse anomaly scenarios
python -u vad_proposed_MiniCPM.py --dataset=shtech --multiple=True --type=jumping-falling-pickupIf you use our work, please consider citing:
@article{ahn2025anyanomaly,
title={AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM},
author={Ahn, Sunghyun and Jo, Youngwan and Lee, Kijung and Kwon, Sein and Hong, Inpyo and Park, Sanghyun},
journal={arXiv preprint arXiv:2503.04504},
year={2025}
}Should you have any question, please create an issue on this repository or contact me at [email protected].






