Abstract

Vision-based Human Activity Recognition (HAR) is a challenging research task in sports. This paper aims to track the player’s movements and recognize the different types of sports activities in videos. The proposed work aims in developing Hybrid Optimized Multimodal SpatioTemporal Feature Fusion (HOM-STFF) model using skeletal information for vision-based sports activity recognition. The proposed HOM-STFF model presents a deep multimodal feature fusion approach that combines the features that are generated from the multichannel-1DCNN and 2D-CNN network model using a concatenative feature fusion process. The fused features are fed into the 2-GRU model that generates temporal features for activity recognition. Nature-inspired Bald Eagle Search Optimizer (BESO) is applied to optimize the network weights during training. Finally, performance of the classification model is evaluated and compared for identifying different activities in sports videos. Experimentation was carried out with the three vision-based sports datasets namely, Sports Videos in the Wild (SVW), UCF50 sports action and Self-build dataset, which achieved accuracy rate of 0.9813, 0.9506 and 0.9733, respectively. The results indicate that the proposed HOM-STFF model outperforms the other state-of-the-art methods in terms of activity detection capability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call