With the rapid advancement of virtual reality, dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments. The recognition of dynamic gestures is a challenging task due to the high degree of freedom and the influence of individual differences and the change of gesture space. To solve the problem of low recognition accuracy of existing networks, an improved dynamic gesture recognition algorithm based on ResNeXt architecture is proposed. The algorithm employs three-dimensional convolution techniques to effectively capture the spatiotemporal features intrinsic to dynamic gestures. Additionally, to enhance the model’s focus and improve its accuracy in identifying dynamic gestures, a lightweight convolutional attention mechanism is introduced. This mechanism not only augments the model’s precision but also facilitates faster convergence during the training phase. In order to further optimize the performance of the model, a deep attention submodule is added to the convolutional attention mechanism module to strengthen the network’s capability in temporal feature extraction. Empirical evaluations on EgoGesture and NvGesture datasets show that the accuracy of the proposed model in dynamic gesture recognition reaches 95.03% and 86.21%, respectively. When operating in RGB mode, the accuracy reached 93.49% and 80.22%, respectively. These results underscore the effectiveness of the proposed algorithm in recognizing dynamic gestures with high accuracy, showcasing its potential for applications in advanced human–computer interaction systems.
Read full abstract