Vision Transformer-Based Tailing Detection in Videos

Jaewoo Lee,Sungjun Lee,Zahid Ali Siddiqui,Wonki Cho,Unsang Park

doi:10.3390/app112411591

Abstract

Tailing is defined as an event where a suspicious person follows someone closely. We define the problem of tailing detection from videos as an anomaly detection problem, where the goal is to find abnormalities in the walking pattern of the pedestrians (victim and follower). We, therefore, propose a modified Time-Series Vision Transformer (TSViT), a method for anomaly detection in video, specifically for tailing detection with a small dataset. We introduce an effective way to train TSViT with a small dataset by regularizing the prediction model. To do so, we first encode the spatial information of the pedestrians into 2D patterns and then pass them as tokens to the TSViT. Through a series of experiments, we show that the tailing detection on a small dataset using TSViT outperforms popular CNN-based architectures, as the CNN architectures tend to overfit with a small dataset of time-series images. We also show that when using time-series images, the performance of CNN-based architecture gradually drops, as the network depth is increased, to increase its capacity. On the other hand, a decreasing number of heads in Vision Transformer architecture shows good performance on time-series images, and the performance is further increased as the input resolution of the images is increased. Experimental results demonstrate that the TSViT performs better than the handcrafted rule-based method and CNN-based method for tailing detection. TSViT can be used in many applications for video anomaly detection, even with a small dataset.

Highlights

Tailing is a situation in which one pedestrian follows another pedestrian in the same direction for some amount of time
Transformer-based “Base” model Time-Series Vision Transformer (TSViT)-B/512 exhibits the best accuracy of 76.56% in comparison with the 63.54% best accuracy of simple-Convolutional Neural Network (CNN), a 13.02% improvement
A method of tailing detection based on Vision Transformer is proposed, which is an end-to-end trainable framework

Summary

Introduction

Tailing is a situation in which one pedestrian follows another pedestrian in the same direction for some amount of time. The intention of a tailing person can range from assaulting, snatching, or even kidnapping. According to a survey [2], the surveillance cameras installed in 2016 worldwide will produce approximately 566 GB of data in one day. This rapid growth of surveillance video data presents higher challenges for video processing and understanding. The development of computer vision techniques of eventdetection [3], video retrieval [4] and video summarizing [5] are eminent part of modern surveillance systems

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 7, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Vision Transformer-Based Tailing Detection in Videos

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Stacked deep polynomial network based representation learning for tumor classification with small ultrasound image dataset
Jun Shi ... Tianfu Wang
Neurocomputing | VOL. 194
Jun Shi, et. al.Jun Shi ... Tianfu Wang
22 Feb 2016
Neurocomputing | VOL. 194

A Quick Review and Performance Analysis of Custom and Transfer Learning CNN Architectures for Event Detection in Videos
Susmitha Alamuru ... Sanjay Jain
-
Susmitha Alamuru, et. al.Susmitha Alamuru ... Sanjay Jain
29 Jul 2022
29 Jul 2022

Sorokh-Poth: A Balanced Bangladeshi Road Vehicle Image Dataset Integrated With a Detection System Based On Deep Convolutional Neural Networks
Tamanna Zubairi Sana ... Sabiul Islam
International Journal of Research Publications | VOL. 122
Tamanna Zubairi Sana, et. al.Tamanna Zubairi Sana ... Sabiul Islam
16 Mar 2023
International Journal of Research Publications | VOL. 122

Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging
Daniel Wolf ... Timo Ropinski
Scientific Reports | VOL. 13
Daniel Wolf, et. al.Daniel Wolf ... Timo Ropinski
20 Nov 2023
Scientific Reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vision Transformer-Based Tailing Detection in Videos

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences