Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Fazliddin Anvarov,Byung Cheol Song,Dae Ha Kim

doi:10.3390/electronics9010147

Abstract

Action recognition is an active research field that aims to recognize human actions and intentions from a series of observations of human behavior and the environment. Unlike image-based action recognition mainly using a two-dimensional (2D) convolutional neural network (CNN), one of the difficulties in video-based action recognition is that video action behavior should be able to characterize both short-term small movements and long-term temporal appearance information. Previous methods aim at analyzing video action behavior only using a basic framework of 3D CNN. However, these approaches have a limitation on analyzing fast action movements or abruptly appearing objects because of the limited coverage of convolutional filter. In this paper, we propose the aggregation of squeeze-and-excitation (SE) and self-attention (SA) modules with 3D CNN to analyze both short and long-term temporal action behavior efficiently. We successfully implemented SE and SA modules to present a novel approach to video action recognition that builds upon the current state-of-the-art methods and demonstrates better performance with UCF-101 and HMDB51 datasets. For example, we get accuracies of 92.5% (16f-clip) and 95.6% (64f-clip) with the UCF-101 dataset, and 68.1% (16f-clip) and 74.1% (64f-clip) with HMDB51 for the ResNext-101 architecture in a 3D CNN.

Highlights

One of the main objectives of artificial intelligence is to build a model that can accurately learn human actions and intentions [1]
The representative research in video-based action recognition is based on two-stream architectures [2], recurrent neural networks (RNN) [3], or spatiotemporal convolutions [4,5]
We propose a sequential version of SE and SA modules and apply them to create a new approach for efficiently analyzing action behavior on 3D convolutional neural network (CNN)

Summary

Introduction

One of the main objectives of artificial intelligence is to build a model that can accurately learn human actions and intentions [1]. Human action recognition is important because it has been applied to various applications, such as surveillance systems, health care systems, and social robots. A three-dimensional (3D) convolutional neural network (CNN) for action recognition with spatiotemporal convolutional kernels achieved better performance than 2D CNNs that can only cover the spatial kernel. The representative research in video-based action recognition is based on two-stream architectures [2], recurrent neural networks (RNN) [3], or spatiotemporal convolutions [4,5]. Most of the research has relied on the modeling of motion and temporal structures Two-stream approaches use two separate CNNs, one using red–green–blue (RGB) data, and the other using optical flow images to deal with movement.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Jan 12, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN
Kun-Hsuan Wu ... Ching-Te Chiu
Journal of Software Engineering and Applications | VOL. 14
Kun-Hsuan Wu, et. al.Kun-Hsuan Wu ... Ching-Te Chiu
01 Jan 2020
Journal of Software Engineering and Applications | VOL. 14

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos
Xiao Liu ... Xudong Yang
-
Xiao Liu, et. al.Xiao Liu ... Xudong Yang
01 Jan 2018
01 Jan 2018

Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs
Yu Gu ... Tao Zhou
Computers in Biology and Medicine | VOL. 103
Yu Gu, et. al.Yu Gu ... Tao Zhou
12 Oct 2018
Computers in Biology and Medicine | VOL. 103

Three-dimensional Deep Convolutional Neural Networks for Automated Myocardial Scar Quantification in Hypertrophic Cardiomyopathy: A Multicenter Multivendor Study.
Ahmed S Fahmy ... Ethan J Rowin
Radiology | VOL. 294
Ahmed S Fahmy, et. al.Ahmed S Fahmy ... Ethan J Rowin
12 Nov 2019
Radiology | VOL. 294

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics