Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

Altaf Hussain,Sung Wook Baik,Tanveer Hussain,Waseem Ullah

doi:10.1155/2022/3454167

Altaf Hussain, Sung Wook Baik + Show 2 more

Open Access

https://doi.org/10.1155/2022/3454167

Copy DOI

Journal: Computational Intelligence and Neuroscience	Publication Date: Apr 4, 2022
Citations: 37	License type: CC BY 4.0

Affiliation: Sejong University

Abstract

Human Activity Recognition is an active research area with several Convolutional Neural Network (CNN) based features extraction and classification methods employed for surveillance and other applications. However, accurate identification of HAR from a sequence of frames is a challenging task due to cluttered background, different viewpoints, low resolution, and partial occlusion. Current CNN-based techniques use large-scale computational classifiers along with convolutional operators having local receptive fields, limiting their performance to capture long-range temporal information. Therefore, in this work, we introduce a convolution-free approach for accurate HAR, which overcomes the above-mentioned problems and accurately encodes relative spatial information. In the proposed framework, the frame-level features are extracted via pretrained Vision Transformer; next, these features are passed to multilayer long short-term memory to capture the long-range dependencies of the actions in the surveillance videos. To validate the performance of the proposed framework, we carried out extensive experiments on UCF50 and HMDB51 benchmark HAR datasets and improved accuracy by 0.944% and 1.414%, respectively, when compared to state-of-the-art deep models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

Action Recognition Using 3D CNN and LSTM for Video Analytics
A Umamakeswari ... Rashikha
-
A Umamakeswari, et. al.A Umamakeswari ... Rashikha
01 Jan 2020
01 Jan 2020

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks
Hanbo Wu ... Xin Ma
International Journal of Advanced Robotic Systems | VOL. 16
Hanbo Wu, et. al.Hanbo Wu ... Xin Ma
01 Jan 2019
International Journal of Advanced Robotic Systems | VOL. 16

Human action recognition in surveillance video of a computer laboratory
Abdul-Lateef Yussiff ... Yong Suet-Peng
-
Abdul-Lateef Yussiff, et. al.Abdul-Lateef Yussiff ... Yong Suet-Peng
01 Aug 2016
01 Aug 2016

Action Recognition in Videos with Temporal Segments Fusions
Yuanye Fang ... Qiu-Feng Wang
-
Yuanye Fang, et. al.Yuanye Fang ... Qiu-Feng Wang
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience