Deep residual infrared action recognition by integrating local and global spatio-temporal cues

Javed Imran,Balasubramanian Raman

doi:10.1016/j.infrared.2019.103014

Abstract

Human action recognition (HAR) is an important area of research in the field of computer vision. Though a lot of efforts have been made in the past for HAR in visible spectrum, yet progress in infrared domain is still very limited. This is due to the fact that very few infrared action recognition datasets are publicly available, and that too with limited number of classes and training samples. To address this issue, we first construct a new dataset called IITR Infrared Action Recognition (IITR-IAR) dataset with 21 classes, each consisting of 70 samples (total 1470 videos). Then two types of modalities are extracted from each video: Stacked Dense Flow Difference Image (SDFDI) and our newly proposed Stacked Saliency Difference Image (SSDI). Second, we propose a novel four-stream deep framework built upon convolutional neural network (CNN) and recurrent neural network (RNN) models. Our CNN stream is based on deep residual architecture called ResNet, while RNN stream is based on bidirectional long short-term memory (BiLSTM) model. Third, to capture spatio-temporal information at global level, a single SDFDI and a single SSDI are generated using entire video, and then two CNN streams are trained. Similarly to capture spatio-temporal information at local level, a video is divided into eight equal segments, and eight SDFDIs and eight SSDIs are generated. These multiple SDFDIs and SSDIs are then used to train two CNN-BiLSTM streams. Finally, the output of all four streams are combined by late fusion to predict the actual class label. With this four-stream architecture in hand, we achieve state-of-the-art results (83.5%) on InfAR dataset. We also present the baseline result of 75.17% on our proposed IITR-IAR dataset, leaving an ample scope of research for remaining computer vision community to develop and apply more advanced deep learning techniques for infrared HAR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep residual infrared action recognition by integrating local and global spatio-temporal cues

Abstract

Talk to us

Similar Papers

More From: Infrared Physics & Technology

Lead the way for us

Journal: Infrared Physics & Technology	Publication Date: Aug 19, 2019
Citations: 13

Similar Papers

CHAPTER 3 - Recurrent neural network: application in facies classification
Miao Tian ... Sumit Verma
Advances in Subsurface Data Analytics | VOL. -
Miao Tian, et. al.Miao Tian ... Sumit Verma
01 Jan 2021
Advances in Subsurface Data Analytics | VOL. -

Tunnel boring machine vibration-based deep learning for the ground identification of working faces
Mengbo Liu ... Yongliang Huang
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13
Mengbo Liu, et. al.Mengbo Liu ... Yongliang Huang
01 Dec 2021
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13

Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures
Fernando Pérez-García ... Sébastien Ourselin
-
Fernando Pérez-García, et. al.Fernando Pérez-García ... Sébastien Ourselin
01 Jan 2020
01 Jan 2020

Hybrid Deep Learning Approach for Stress Detection Using Decomposed EEG Signals.
Bishwajit Roy ... Lokesh Malviya
Diagnostics | VOL. 13
Bishwajit Roy, et. al.Bishwajit Roy ... Lokesh Malviya
01 Jun 2023
Diagnostics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep residual infrared action recognition by integrating local and global spatio-temporal cues

Abstract

Talk to us

Similar Papers

More From: Infrared Physics &amp; Technology

More From: Infrared Physics & Technology