Improved two-stream model for human action recognition

Yuxuan Zhao,Sheng-Uei Guan,Kamran Siddique,Ka Lok Man,Jeremy Smith

doi:10.1186/s13640-020-00501-x

Abstract

This paper addresses the recognitions of human actions in videos. Human action recognition can be seen as the automatic labeling of a video according to the actions occurring in it. It has become one of the most challenging and attractive problems in the pattern recognition and video classification fields. The problem itself is difficult to solve by traditional video processing methods because of several challenges such as the background noise, sizes of subjects in different videos, and the speed of actions. Derived from the progress of deep learning methods, several directions are developed to recognize a human action from a video, such as the long-short-term memory (LSTM)-based model, two-stream convolutional neural network (CNN) model, and the convolutional 3D model.In this paper, we focus on the two-stream structure. The traditional two-stream CNN network solves the problem that CNNs do not have satisfactory performance on temporal features. By training a temporal stream, which uses the optical flow as the input, a CNN can have the ability to extract temporal features. However, the optical flow only contains limited temporal information because it only records the movements of pixels on the x-axis and the y-axis. Therefore, we attempt to design and implement a new two-stream model by using an LSTM-based model in its spatial stream to extract both spatial and temporal features in RGB frames. In addition, we implement a DenseNet in the temporal stream to improve the recognition accuracy. This is in-contrast to traditional approaches which typically utilize the spatial stream for extracting only spatial features. The quantitative evaluation and experiments are conducted on the UCF-101 dataset, which is a well-developed public video dataset. For the temporal stream, we choose the optical flow of UCF-101. Images in the optical flow are provided by the Graz University of Technology. The experimental result shows that the proposed method outperforms the traditional two-stream CNN method with an accuracy of at least 3%. For both spatial and temporal streams, the proposed model also achieves higher recognition accuracies. In addition, compared with the state of the art methods, the new model can still have the best recognition performance.

Highlights

Action recognition aims to recognize the motions and actions of objects
Human action recognition is used in some surveillance systems and video processing tools [2]
2 Methodology In this work, the proposed model can be mainly decomposed into three modules. They are a spatial stream with the long-short-term memory (LSTM), a temporal stream with a DenseNet, and a fusion layer with support vector machine (SVM) [10]

Summary

Introduction

Action recognition aims to recognize the motions and actions of objects. In the human action recognition field, vision-based action recognition is one of the most popular and essential problems [1]. It requires approaches to track and distinguish the behavior of the subject through videos. Human action recognition is used in some surveillance systems and video processing tools [2]. Based on the rapid development of computer vision and neural networks, vast improvements have been achieved in the action recognition field [3, 4]. By using CNNs, spatial features from RGB video frames can be extracted, which is similar to its functions in image recognition [5, 6]. The critical challenge of video human action recognition is how to obtain and handle temporal features effectively.

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Image and Video Processing	Publication Date: Jun 17, 2020
Citations: 35	License type: open-access

R Discovery Prime

R Discovery Prime

Improved two-stream model for human action recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Image and Video Processing

Lead the way for us

Similar Papers

R(2+1)D-based Two-stream CNN for Human Activities Recognition in Videos
Min Huang ... Wenbo Xiang
-
Min Huang, et. al.Min Huang ... Wenbo Xiang
26 Jul 2021
26 Jul 2021

Action Recognition in Videos Using Pre-Trained 2D Convolutional Neural Networks
Jun-Hwa Kim ... Chee Sun Won
IEEE Access | VOL. 8
Jun-Hwa Kim, et. al.Jun-Hwa Kim ... Chee Sun Won
01 Jan 2020
IEEE Access | VOL. 8

Spatial-temporal interaction learning based two-stream network for action recognition
Tianyu Liu ... Ping Jiang
Information Sciences | VOL. 606
Tianyu Liu, et. al.Tianyu Liu ... Ping Jiang
28 May 2022
Information Sciences | VOL. 606

Bidirectional Long Short-Term Memory with Temporal Dense Sampling for human action recognition
Kok Seang Tan ... Lee Chung Kwek
Expert Systems with Applications | VOL. 210
Kok Seang Tan, et. al.Kok Seang Tan ... Lee Chung Kwek
12 Aug 2022
Expert Systems with Applications | VOL. 210

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved two-stream model for human action recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Image and Video Processing