Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN

Kun-Hsuan Wu,Ching-Te Chiu

doi:10.4236/jsea.2021.145011

Abstract

Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters; they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost; further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network; therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.

Highlights

In the field of computer vision, human action recognition has become increasingly research-worthy
Methods based on two-dimensional convolutional neural networks use fewer parameters; they often incorporate optical flow to compensate for their inability to learn temporal relationships
We proposed an action recognition framework based on the two-dimensional convolutional neural network; it was necessary to resolve the lack of temporal relationships

Summary

Introduction

In the field of computer vision, human action recognition has become increasingly research-worthy. With the development of technology, action recognition has wide applications in the present era. Several studies on action recognition led to the direct inflation of the filters of these models from two-dimensional (2D) to three-dimensional (3D) to obtain inflated 3D ConvNets (I3D) [7], resolution 3D LLC (Res3D) [8], ResNeXt3D [9], among other models. There are two main approaches to action recognition: 2D CNN (convolutional neural network) and 3D CNN. The 2D CNN method performs convolution on one frame at a time, without temporal fusion. The 3D CNN method performs convolution on multiple frames using 3D convolutional filters to achieve spatio-temporal learning

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Software Engineering and Applications	Publication Date: Jan 1, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Software Engineering and Applications

Lead the way for us

Similar Papers

Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention
Fazliddin Anvarov ... Dae Ha Kim
Electronics | VOL. 9
Fazliddin Anvarov, et. al.Fazliddin Anvarov ... Dae Ha Kim
12 Jan 2020
Electronics | VOL. 9

Classification of tree symbiotic fungi based on hyperspectral imagery and hybrid convolutional neural networks
Zhuo Liu ... Zhilin Yuan
Frontiers in Forests and Global Change | VOL. 6
Zhuo Liu, et. al.Zhuo Liu ... Zhilin Yuan
05 May 2023
Frontiers in Forests and Global Change | VOL. 6

Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks.
Yanbu Guo ... Bei Yang
Journal of Bioinformatics and Computational Biology | VOL. 16
Yanbu Guo, et. al.Yanbu Guo ... Bei Yang
01 Oct 2018
Journal of Bioinformatics and Computational Biology | VOL. 16

Various frameworks for integrating image and video streams for spatiotemporal information learning employing 2D–3D residual networks for human action recognition
Shaimaa Yosry ... Rania R Ziedan
Discover Applied Sciences | VOL. 6
Shaimaa Yosry, et. al.Shaimaa Yosry ... Rania R Ziedan
18 Mar 2024
Discover Applied Sciences | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Software Engineering and Applications