With the rapid growth of computer vision and deep learning technologies, the application of pose estimation and action recognition in sports training has become increasingly widespread. Due to factors such as complex movements, fast speed, and limb occlusion, pose estimation and action recognition in tennis face significant challenges. Therefore, this study first introduces selective dropout and pyramid region of interest pooling layer strategies in fast region convolutional neural networks. Secondly, a pose estimation algorithm based on multi-scale fusion pose residual network 50 is designed, and finally a spatiotemporal graph convolutional network model is constructed by fusing channel attention module and multi-scale dilated convolution module. The data showed that the average detection accuracy of the improved attitude residual network 50 was 70.4%, and the accuracy of object detection for small, medium, and large objects was 57.4%, 69.3%, and 79.2%, respectively. The continuous action recognition accuracy and inter action fluency detection time of the improved spatiotemporal graph convolutional network were 93.8% and 19.2 ms, respectively. When the sample size was 1000, its memory usage was 1378 MB and the running time was 32.7 ms. Experiments have shown that the improved model achieves high accuracy and robustness in tennis action recognition tasks, especially in complex scenes and limb occlusion conditions, where the model shows significant advantages. This study aims to provide an efficient and accurate motion recognition technology for tennis posture analysis and intelligent training.
Read full abstract