Abstract

Human action recognition (HAR) is an important area of research in the field of computer vision. Though a lot of efforts have been made in the past for HAR in visible spectrum, yet progress in infrared domain is still very limited. This is due to the fact that very few infrared action recognition datasets are publicly available, and that too with limited number of classes and training samples. To address this issue, we first construct a new dataset called IITR Infrared Action Recognition (IITR-IAR) dataset with 21 classes, each consisting of 70 samples (total 1470 videos). Then two types of modalities are extracted from each video: Stacked Dense Flow Difference Image (SDFDI) and our newly proposed Stacked Saliency Difference Image (SSDI). Second, we propose a novel four-stream deep framework built upon convolutional neural network (CNN) and recurrent neural network (RNN) models. Our CNN stream is based on deep residual architecture called ResNet, while RNN stream is based on bidirectional long short-term memory (BiLSTM) model. Third, to capture spatio-temporal information at global level, a single SDFDI and a single SSDI are generated using entire video, and then two CNN streams are trained. Similarly to capture spatio-temporal information at local level, a video is divided into eight equal segments, and eight SDFDIs and eight SSDIs are generated. These multiple SDFDIs and SSDIs are then used to train two CNN-BiLSTM streams. Finally, the output of all four streams are combined by late fusion to predict the actual class label. With this four-stream architecture in hand, we achieve state-of-the-art results (83.5%) on InfAR dataset. We also present the baseline result of 75.17% on our proposed IITR-IAR dataset, leaving an ample scope of research for remaining computer vision community to develop and apply more advanced deep learning techniques for infrared HAR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.