Spatio-Temporal Features based Human Action Recognition using Convolutional Long Short-Term Deep Neural Network

A F M Saifuddin Saif,Ebisa D Wollega,Sylvester A Kalevela

doi:10.14569/ijacsa.2023.0140501

A F M Saifuddin Saif, Ebisa D Wollega + Show 1 more

Open Access

https://doi.org/10.14569/ijacsa.2023.0140501

Copy DOI

Abstract

Recognition of human intention is crucial and challenging due to subtle motion patterns of a series of action evolutions. Understanding of human actions is the foundation of many applications, i.e., human robot interaction, smart video monitoring and autonomous driving etc. Existing deep learning methods use either spatial or temporal features during training. This research focuses on developing a lightweight method using both spatial and temporal features to predict human intention correctly. This research proposes Convolutional Long Short-Term Deep Network (CLSTDN) consists of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). CNN uses Inception-ResNet-v2 to classify object specific class categories by extracting spatial features and RNN uses Long Short-Term Memory (LSTM) for final prediction based on temporal features. Proposed method was validated on four challenging benchmark dataset, i.e., UCF Sports, UCF-11, KTH and UCF-50. Performance of the proposed method was evaluated using seven performance metrics, i.e., accuracy, precision, recall, f-measure, error rate, loss and confusion matrix. Proposed method showed better results comparing with existing research results. Proposed method is expected to encourage researchers to use in future for real time implications to predict human intentions more robustly.

Full Text