Yoga action recognition based on STF-ResNet

Yao Wanjun,Chen Chong,Cheng Rui

doi:10.1109/icpeca56706.2023.10076099

Abstract

Aiming at the disadvantages of traditional two-stream convolutional networks which are difficult to learn spatial-temporal correlation information and low recognition accuracy, a yoga action recognition algorithm based on Spatial-Temporal Fusion Residual Network (STF-ResNet) was proposed. The spatial-temporal features are complemented by mixing spatial and temporal stream features with residuals, and the information loss of high-level features is compensated by low-level features. The convolutional block attention module (CBAM) is added before mixing to filter the yoga action features again in both channel and spatial dimensions. A convolutional block attention module is added before the fusion to filter the yoga action features again from both channel and spatial dimensions. Experimentally validated on a custom Yoga dataset, the algorithm improves the accuracy of yoga recognition with an average accuracy of 98.6%, a 6.3% improvement compared to traditional two-stream convolutional networks.

Full Text