Abstract

Aiming at the disadvantages of traditional two-stream convolutional networks which are difficult to learn spatial-temporal correlation information and low recognition accuracy, a yoga action recognition algorithm based on Spatial-Temporal Fusion Residual Network (STF-ResNet) was proposed. The spatial-temporal features are complemented by mixing spatial and temporal stream features with residuals, and the information loss of high-level features is compensated by low-level features. The convolutional block attention module (CBAM) is added before mixing to filter the yoga action features again in both channel and spatial dimensions. A convolutional block attention module is added before the fusion to filter the yoga action features again from both channel and spatial dimensions. Experimentally validated on a custom Yoga dataset, the algorithm improves the accuracy of yoga recognition with an average accuracy of 98.6%, a 6.3% improvement compared to traditional two-stream convolutional networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call