With the rapid development of artificial intelligence and computer vision technology, dance action recognition and quality assessment have become a challenging and important research field. Traditional dance movement recognition methods often rely on manual feature extraction and expert scoring, which have problems such as strong subjectivity, low efficiency, and difficulty in handling complex movements. To address these issues, this study proposes a dance action recognition and quality assessment method based on Spatio Temporal Convolutional Neural Network (ST-CNN). Specifically, it includes a feature extraction network based on 3D spatiotemporal description operators and a multi-scale aggregated long short memory network for extracting spatial features of single frame images and fusing video sequence features composed of all frames. This approach can effectively capture dynamic changes in dance movements by combining information from spatial and temporal dimensions, thereby achieving more accurate recognition and evaluation. In addition, to address the high storage and computational requirements of convolutional neural networks during training, we integrated dance video sequences into an IoT architecture to address parallel recognition and evaluation of multiple videos. Finally, we validated the proposed method on the public datasets NTU-RGB 60 and NTU-RGB 120, and the experimental results demonstrated the effectiveness of the method.
Read full abstract