Abstract

In recent years, human action recognition is modeled as a spatial-temporal video volume. Such aspects have recently expanded greatly due to their explosively evolving real-world uses, such as visual surveillance, autonomous driving, and entertainment. Specifically, the spatio-temporal interest points (STIPs) approach has been widely and efficiently used in action representation for recognition. In this work, a novel approach based on the STIPs is proposed for action descriptors i.e., Two Dimensional-Difference Intensity Distance Group Pattern (2D-DIDGP) and Three Dimensional-Difference Intensity Distance Group Pattern (3D-DIDGP) for representing and recognizing the human actions in video sequences. Initially, this approach captures the local motion in a video that is invariant to size and shape changes. This approach extends further to build unique and discriminative feature description methods to enhance the action recognition rate. The transformation methods, such as DCT (Discrete cosine transform), DWT (Discrete wavelet transforms), and hybrid DWT+DCT, are utilized. The proposed approach is validated on the UT-Interaction dataset that has been extensively studied by past researchers. Then, the classification methods, such as Support Vector Machines (SVM) and Random Forest (RF) classifiers, are exploited. From the observed results, it is perceived that the proposed descriptors especially the DIDGP based descriptor yield promising results on action recognition. Notably, the 3D-DIDGP outperforms the state-of-the-art algorithm predominantly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call