Abstract

Human pose recognition from videotapes has become an emerging research topic for tracking human movements. The objective of this work is to develop a deep multimodal Spatio-Temporal Harris Hawk Optimized Pose Recognition (STHHO-PR) framework for self-learning fitness exercises. The presented STHHO-PR framework uses audio modality and visual modality to classify the different poses. In audio modality, the VGG-16 network paradigm is used to extract the audio traits for fitness pose recognition. In visual modality, Harris Hawks Optimization (HHO) along with the Minimum Cross Entropy (MCE) method is employed to find out the optimum threshold values for body parts segmentation. These segmented body parts highlight the human joint points that are connected through the skeletonization process to extract the skeletal information. The extracted spatio-temporal features from audio modality and visual modality are optimally fused and used in the classification process. Weighted Majority Voting Ensemble (WMVE) classifier is adopted to build the classification model. This work is experimented with yoga videos acquired from publicly available datasets. The results show that the presented STHHO-PR framework outperforms other state-of-art procedures in terms of prediction accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call