Abstract

Automated human action recognition is one of the most attractive and practical research fields in computer vision. In such systems, the human action labelling is based on the appearance and patterns of the motions in the video sequences; however, majority of the existing research and most of the conventional methodologies and classic neural networks either neglect or are not able to use temporal information for action recognition prediction in a video sequence. On the other hand, the computational cost of a proper and accurate human action recognition is high. In this paper, we address the challenges of the preprocessing phase, by an automated selection of representative frames from the input sequences. We extract the key features of the representative frame rather than the entire features. We propose a hierarchical technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. The combination of a CNN and the LSTM recursive network is considered for feature selection and maintaining the previous information; and finally, a Softmax-KNN classifier is used for labelling the human activities. We name our model as “Hierarchical Feature Reduction & Deep Learning”-based action recognition method, or HFR-DL in short. To evaluate the proposed method, we use the UCF101 dataset for the benchmarking which is widely used among researchers in the action recognition research field. The dataset includes 101 complicated activities in the wild. Experimental results show a significant improvement in terms of accuracy and speed in comparison with eight state-of-the-art methods.

Highlights

  • The Human Activity or Human Action Recognition (HAR) is an active field in the present era, there are still key aspects which should be taken into consideration in order to accurately realise how people interact with each other or while using digital devices [11, 12, 63]

  • The learning process module includes Background Subtraction, Histogram of Oriented Gradients, and Skeletons (BGS-histogram of oriented gradients (HOG)-SKE), where we call it feature reduction module; we develop the Convolutional Neural Networks (CNNs)-LSTM model as deep learning module; and the K-Nearest Neighbour (KNN) and Softmax layer as the human action classification sub-modules

  • Later in “Experimental Results”, we will show the main advantage of Recurrent Neural Network (RNN) and deep LSTM with a higher accuracy rate in complex action recognition, comparing

Read more

Summary

Introduction

The Human Activity or Human Action Recognition (HAR) is an active field in the present era, there are still key aspects which should be taken into consideration in order to accurately realise how people interact with each other or while using digital devices [11, 12, 63]. Human activity recognition is a sequence of multiple and complex sub-actions. This has been recently investigated by many researchers around the world using different types of sensors. Automatic recognition of human activities using computer vision has. We can obtain the required information from a given subject using different types of sensors such as cameras and wearable sensors [1, 39, 58]. Cameras are more suitable sensors for security applications (such as intrusion detection) and other interactive applications. Activities in different directions can be identified as forward or backward, rotation, or SN Computer Science Vol.:(0123456789)

94 Page 2 of 15
Related Work
Methodology
94 Page 4 of 15
94 Page 8 of 15
94 Page 10 of 15
Experimental Results
Methods
94 Page 14 of 15
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call