Integrating Vision and Language: An Improved VAD Model

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Automatic anomaly detection in video surveillance is crucial for public and private safety. However, it is challenging because of unclear abnormal events, limited labeled data, and mismatches between different types of data. Traditional video anomaly detection methods mainly focus on spatiotemporal visual features. They often ignore semantic information and interactions between different data types. Additionally, many multimodal approaches use basic fusion methods that do not solve the alignment problems between these types of data. To address these issues, we propose a multimodal framework that includes a Hierarchical Multi-scale Temporal Network (H-MSTN). This network models short-, medium-, and long-term dependencies in visual and textual data. A lightweight cross-modal attention module makes sure the semantics align. Meanwhile, a Multimodal Attention-Based Fusion Transformer (MAFT) refines cross-modal representations in real time. We evaluate this framework using the UCF-Crime and XD-Violence benchmarks. The proposed method achieves 92.42% AUC on UCF-Crime and 88.63% AP on XD-Violence with significantly lower computational cost and faster inference than recent multimodal baselines such as ReFLIP-VAD. These results demonstrate a strong efficiency–accuracy trade-off for real-time deployment while maintaining competitive or improved performance over prior methods such as MVAD and TEVAD.

Similar Papers
  • Conference Article
  • Cite Count Icon 1
  • 10.52591/lxai2023061810
Anomaly Detection in Surveillance Videos Using Spatio-Temporal Context Information
  • Jun 18, 2023
  • Hernan Benitez-Restrepo + 1 more

Several computer vision algorithms have been proposed to detect anomalous activities (robberies, murders, vandalism, among others) in videos. According to the learning approach, they can be classified into probabilistic distribution modeling, sparse coding, and deep learning based methods. The main drawbacks of these approaches are (i) extraction of low-level features that do not capture complex behaviors of instances on the scene, (ii) generation of features from irrelevant regions, (iii) overlooking of relationships among objects, and (iv) omission of long-term dependencies. To solve these issues, we propose a deep learning architecture that leverages the relationships among objects. It achieves this by using an attention mechanism and learning long-term dependencies using a multilayer recurrent neural network (multilayer LSTM). An AUC score of 0.749 on the UCF-Crime dataset confirms that the proposed algorithm competes effectively against several state-of-the-art approaches for anomaly detection in surveillance videos. It also explains the relationship between regions in the video frames and the anomaly detections.

  • Conference Article
  • Cite Count Icon 27
  • 10.1145/3394171.3416298
Enhancing Anomaly Detection in Surveillance Videos with Transfer Learning from Action Recognition
  • Oct 12, 2020
  • Kun Liu + 4 more

Anomaly detection in surveillance videos, as a special case of video-based action recognition, has been of increasing interest in multimedia community and public security. Action recognition in videos faces some challenges, such as cluttered background, illumination conditions. Besides these above difficulties, detecting anomaly in surveillance videos has several unique problems to be solved. For example, the lack of sufficient training samples is one of the main challenges for detecting anomalies in surveillance videos. In this paper, we propose to utilize transfer learning to leverage the good results from action recognition for anomaly detection in surveillance videos. More specially, we explore some techniques based on action recognition models from the following aspects: training samples, temporal modules for action recognition, network backbones. We draw some conclusions. First, more training samples from surveillance videos lead to higher classification accuracy. Second, stronger temporal modules designed for recognizing action and deeper networks do not achieve better results. This conclusion is reasonable since deeper networks tend to over-fitting, especially for the small-scale training set. Besides, to distinguish the hard examples from normal activities, we separately train a neural network to classify the hard category and normal events. Then we fuse the binary network and previous network to generate the final prediction for general anomaly detection. On the benchmarks of CitySCENE, our framework achieves promising performance and obtains the first prize for general anomaly detection and the second prize for specific anomaly detection.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 127
  • 10.3390/s23115024
Deep Learning-Based Anomaly Detection in Video Surveillance: A Survey.
  • May 24, 2023
  • Sensors (Basel, Switzerland)
  • Huu-Thanh Duong + 2 more

Anomaly detection in video surveillance is a highly developed subject that is attracting increased attention from the research community. There is great demand for intelligent systems with the capacity to automatically detect anomalous events in streaming videos. Due to this, a wide variety of approaches have been proposed to build an effective model that would ensure public security. There has been a variety of surveys of anomaly detection, such as of network anomaly detection, financial fraud detection, human behavioral analysis, and many more. Deep learning has been successfully applied to many aspects of computer vision. In particular, the strong growth of generative models means that these are the main techniques used in the proposed methods. This paper aims to provide a comprehensive review of the deep learning-based techniques used in the field of video anomaly detection. Specifically, deep learning-based approaches have been categorized into different methods by their objectives and learning metrics. Additionally, preprocessing and feature engineering techniques are discussed thoroughly for the vision-based domain. This paper also describes the benchmark databases used in training and detecting abnormal human behavior. Finally, the common challenges in video surveillance are discussed, to offer some possible solutions and directions for future research.

  • Research Article
  • Cite Count Icon 38
  • 10.1109/access.2024.3488797
Enhanced Anomaly Detection in Pandemic Surveillance Videos: An Attention Approach With EfficientNet-B0 and CBAM Integration
  • Jan 1, 2024
  • IEEE Access
  • Sareer Ul Amin + 4 more

We present a novel system for anomaly detection in surveillance videos, specifically focusing on identifying instances where individuals deviate from public health guidelines during the pandemic. These anomalies encompassed behaviours like the absence of face masks, incorrect mask usage, coughing, nose-picking, sneezing, spitting, and yawning. Monitoring such anomalies manually was challenging and prone to errors, necessitating automated solutions. To address this, a multi-attention-based deep learning system was employed, utilizing the EfficientNet-B0 architecture. EfficientNet-B0, featuring the Mobile Inverted Bottleneck Convolution (MBConv) block with Squeeze-and-Excitation (SE) modules, emphasizes informative channel characteristics while disregarding irrelevant ones. However, this approach neglected crucial spatial information necessary for visual recognition tasks. To improve this, the Convolutional Block Attention Module (CBAM) was integrated into EfficientNet-B0 to improve feature extraction. The baseline EfficientNet-B0 model’s SE module was replaced with the CBAM module within each MBConv module to retain spatial information related to anomaly activities. Additionally, the CBAM module, when embedded after the second convolutional layer, was observed to significantly enhance the classification ability of the model across different anomaly classes, resulting in a significant accuracy boost from 87 to 96%. In conclusion, we demonstrated the efficacy of the CBAM module in refining feature extraction and improving the classification performance of the proposed method, showcasing its potential for robust anomaly detection in surveillance videos.

  • Research Article
  • Cite Count Icon 116
  • 10.1016/j.patcog.2021.107865
Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate
  • Feb 1, 2021
  • Pattern Recognition
  • Keval Doshi + 1 more

Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate

  • Research Article
  • 10.52783/pmj.v34.i3.1778
Anomaly Detection in Video Surveillance: A Comparative Analysis of Deep Learning Models
  • Oct 1, 2024
  • Panamerican Mathematical Journal
  • Sangita Mahendra Rajput

Anomaly detection in video surveillance is critical for enhancing security and public safety across various applications, including traffic monitoring, public spaces, and industrial settings. Traditional methods often struggle with the complexity and variability of real-world data, prompting a shift towards advanced machine learning models. This paper presents a comprehensive analysis of deep learning algorithms, including YOLOv5, 3D CNNs, LSTM, Deep SVDD, Vision Transformers, Temporal Transformers, and Autoencoders, applied to three benchmark datasets: CIFAR-10, MVTec AD, and UCSD Anomaly Detection. We compare these algorithms based on accuracy, precision, recall, and F1-score, providing insight into their strengths and weaknesses. The results suggest that Vision Transformers and CNN-LSTM hybrids offer superior performance across spatial and temporal anomaly detection tasks.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 60
  • 10.3390/app12125772
Development and Optimization of Deep Learning Models for Weapon Detection in Surveillance Videos
  • Jun 7, 2022
  • Applied Sciences
  • Soban Ahmed + 4 more

Weapon detection in CCTV camera surveillance videos is a challenging task and its importance is increasing because of the availability and easy access of weapons in the market. This becomes a big problem when weapons go into the wrong hands and are often misused. Advances in computer vision and object detection are enabling us to detect weapons in live videos without human intervention and, in turn, intelligent decisions can be made to protect people from dangerous situations. In this article, we have developed and presented an improved real-time weapon detection system that shows a higher mean average precision (mAP) score and better inference time performance compared to the previously proposed approaches in the literature. Using a custom weapons dataset, we implemented a state-of-the-art Scaled-YOLOv4 model that resulted in a 92.1 mAP score and frames per second (FPS) of 85.7 on a high-performance GPU (RTX 2080TI). Furthermore, to achieve the benefits of lower latency, higher throughput, and improved privacy, we optimized our model for implementation on a popular edge-computing device (Jetson Nano GPU) with the TensorRT network optimizer. We have also performed a comparative analysis of the previous weapon detector with our presented model using different CPU and GPU machines that fulfill the purpose of this work, making the selection of model and computing device easier for the users for deployment in a real-time scenario. The analysis shows that our presented models result in improved mAP scores on high-performance GPUs (such as RTX 2080TI), as well as on low-cost edge computing GPUs (such as Jetson Nano) for weapon detection in live CCTV camera surveillance videos.

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.2652301
Anomaly detection and recognition of video surveillance images based on deep learning
  • Nov 10, 2022
  • Zehao Bao + 3 more

In order to solve the problems of pool generalization ability of traditional algorithms and high cost of manual inspection for abnormal image detection in remote video surveillance, this paper proposes an algorithm for abnormal image detection in video surveillance based on deep learning. First, the convolutional neural network based on VGG-16 uses the he_normal method to initialize the weights, and then the self-made datasets is preprocessed and input into the convolutional neural network for training, and finally an image for detecting video surveillance is obtained Model of abnormal interference. Experimental results show that this method can detect abnormal interference such as overexposure of brightness, color distortion, and video freezes in video surveillance, with an accuracy rate of 86%.

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/icccnt49239.2020.9225378
Temporal Pooling in Inflated 3DCNN for Weakly-supervised Video Anomaly Detection
  • Jul 1, 2020
  • Snehashis Majhi + 2 more

Anomaly detection in surveillance videos requires significant attention in feature engineering to discriminate anomaly activity patterns from normal patterns. Keeping this in mind, this paper aims to extract superior quality spatio-temporal features from Inflated 3DCNN followed by a temporal pooling strategy to intensify relevant spatio-temporal feature in untrimmed anomalous videos. A superior temporal pooling strategy leads to better understanding of temporal dependency through LSTM model, which has become a necessary step for anomaly detection in surveillance videos. Thus, we propose a method consisting of an ideal temporal pooling strategy in inflated 3DCNN feature map along with LSTM model for temporal dependency encoding for weakly-supervised anomaly detection task. Our method is validated on a large scale video anomaly detection dataset, namely UCF-crime, resulting competitive performance in anomaly detection task with recent state-of-the-art methodologies.

  • Conference Article
  • Cite Count Icon 38
  • 10.1109/ncc.2018.8599969
A Deep Learning Based Technique for Anomaly Detection in Surveillance Videos
  • Feb 1, 2018
  • Prakhar Singh + 1 more

In this paper the problem of anomaly detection in surveillance videos is addressed, which refers to the detection of events that do not conform to normal behaviour. To solve this problem, this paper proposes an approach that utilizes a Deep Neural Network (DNN) to model normal behaviour. Specifically, a DNN is built that learns to predict future frames from past frames using a normal (anomaly free) dataset. The predictions from the model are then compared with testing video for similarity, and the resulting error is used to detect anomalies. Benchmarks of the proposed approach on two datasets common in the anomaly detection literature show that it performs comparably to other methods in the literature, even though it does not rely on any hand-crafted features. Moreover, comparison to other deep learning techniques in the literature shows that the proposed approach is significantly less complex.

  • Research Article
  • Cite Count Icon 19
  • 10.3390/s25010251
Deep BiLSTM Attention Model for Spatial and Temporal Anomaly Detection in Video Surveillance
  • Jan 4, 2025
  • Sensors
  • Sarfaraz Natha + 5 more

Detection of anomalies in video surveillance plays a key role in ensuring the safety and security of public spaces. The number of surveillance cameras is growing, making it harder to monitor them manually. So, automated systems are needed. This change increases the demand for automated systems that detect abnormal events or anomalies, such as road accidents, fighting, snatching, car fires, and explosions in real-time. These systems improve detection accuracy, minimize human error, and make security operations more efficient. In this study, we proposed the Composite Recurrent Bi-Attention (CRBA) model for detecting anomalies in surveillance videos. The CRBA model combines DenseNet201 for robust spatial feature extraction with BiLSTM networks that capture temporal dependencies across video frames. A multi-attention mechanism was also incorporated to direct the model’s focus to critical spatiotemporal regions. This improves the system’s ability to distinguish between normal and abnormal behaviors. By integrating these methodologies, the CRBA model improves the detection and classification of anomalies in surveillance videos, effectively addressing both spatial and temporal challenges. Experimental assessments demonstrate that the CRBA model achieves high accuracy on both the University of Central Florida (UCF) and the newly developed Road Anomaly Dataset (RAD). This model enhances detection accuracy while also improving resource efficiency and minimizing response times in critical situations. These advantages make it an invaluable tool for public safety and security operations, where rapid and accurate responses are needed for maintaining safety.

  • Conference Article
  • 10.1117/12.2573117
Target analysis based anomaly detection in surveillance videos
  • Jun 12, 2020
  • Jie Zhang + 1 more

Abnormal behavior detection in surveillance video is a pivotal part of the intelligent city. Most of the existing methods only consider how to detect anomalies, with less considering to explain the reason of the anomalies. In this work, we investigate an orthogonal perspective based on the reason of these abnormal behaviors. We propose a multivariate fusion method that analyzes each target through three branches: object, action and motion. The object branch focuses on the appearance information, the motion branch focuses on the distribution of the motion features, and the action branch focuses on the action category of the target. The information that these branches focus on is different, and they can complement each other and jointly detect abnormal behavior. The final abnormal score can then be obtained by combining the abnormal scores of the three branches. In the action branch, we also propose an action recognition module using inter-frame information to solve the multi-target and multi-action recognition in the surveillance video, which is not utilized before in the anomaly detection field. The proposed method outperforms the state-of-the-art methods and also can explain why the target is detected as an anomaly.

  • Conference Article
  • 10.1109/cscwd49262.2021.9437826
Selection Biased Positive and Unlabeled Learning Method for Anomaly Detection in Surveillance Videos
  • May 5, 2021
  • Feiyu Shang + 3 more

Anomaly detection in surveillance videos aims at identifying abnormal event under specific scenarios and it is widely applied in public security, smart city, and pedestrian surveillance. In the weakly-supervised setting, most existing anomaly detection approaches are formulated as the classic multiple-instance learning problem. In this paper, we provide a unique perspective that selection biased positive and unlabeled learning. In such a viewpoint, as long as estimating the label frequency from training set, we can effectively apply supervised classifier tow eakly supervised anomaly detection, and take greater advantage of these well-developed classifiers. For this purpose, we present a novel method to estimate label frequency from the attribute subdomains with large label probability. In the test phase, we only use the label frequency to modify the supervised classifier. Comprehensive experiments are performed on different scales datasets. Our method provides superior on all dataset which demonstrate the effectiveness.

  • Research Article
  • Cite Count Icon 1
  • 10.30572/2018/kje/150409
CLOUD-SMART SURVEILLANCE: ENHANCING ANOMALY DETECTION IN VIDEO STREAMS WITH DF-CONVLSTM-BASED VAE-GAN
  • Nov 1, 2024
  • Kufa Journal of Engineering
  • Sivalingan H

Anomaly detection in computer vision is crucial, and manual identification of irregularities in videos is resource-intensive. Autonomous systems are essential for efficiently analysing and detecting anomalies in diverse video datasets. Video surveillance relies heavily on anomaly detection for monitoring equipment states through time-series data. Presently, deep learning methods, particularly those based on Generative Adversarial Networks (GAN), have gained prominence in time-series anomaly detection. This paper proposes a novel solution: the double-flow convolutional Long Short-Term Memory (DF-ConvLSTM) - based Variational Autoencoder- Generative Adversarial Network (VAE-GAN) method. By co-training the encoder, generator, and discriminator, this approach leverages the encoder's mapping skills and the discriminator's discrimination capabilities simultaneously. The proposed strategy is compared with LSTM-VAE, LSTM-VAE-Attention, and VAE. The proposed method is evaluated using metrics for recall, accuracy, precision, and F1 score. With classification accuracies of 91% on the University of Central Florida (UCF) crime dataset, the experimental results outperformed alternative techniques. Furthermore, the analysis of the ROC curve revealed that the suggested method performed better than the others, as evidenced by its higher ROC (Receiver Operating Characteristic) values. Experimental results demonstrate the proposed method's ability to rapidly and accurately detect anomalies in surveillance videos, ensuring efficient and reliable anomaly detection. Experimental results show the method's rapid, accurate anomaly detection in surveillance videos, ensuring efficiency and reliability. However, challenges include high computational costs, affecting the practicality of implementation for real-time anomaly detection.

  • Research Article
  • Cite Count Icon 38
  • 10.1109/lsp.2021.3117737
Cross-Epoch Learning for Weakly Supervised Anomaly Detection in Surveillance Videos
  • Jan 1, 2021
  • IEEE Signal Processing Letters
  • Shenghao Yu + 4 more

Weakly Supervised Anomaly Detection (WSAD) in surveillance videos is a complex task since usually only video-level annotations are available. Previous work treated it as a regression problem by giving different scores on normal and anomaly events. However, the widely used mini-batch training strategy may suffer from the data imbalance between these two types of events, which limits the model's performance. In this work, a cross-epoch learning (XEL) strategy associated with a hard instance bank (HIB) is proposed to introduce additional information from previous training epochs. Two new losses are proposed for XEL to achieve a higher detection rate as well as a lower false alarm rate of anomaly events. Moreover, the proposed XEL can be directly integrated into any existing WSAD framework. Experimental results of three XEL embedded models have shown promising AUC improvement (3%~7%) on two public datasets, surpassing the state-of-the-art methods. Our code is available at: https://github.com/sdjsngs/XEL-WSAD.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant