This paper introduces the Spatio-Temporal Cross Network (STCNet), a novel deep learning architecture tailored for robust hand gesture recognition in surface electromyography (sEMG) across multiple subjects. We address the challenges associated with the inter-subject variability and environmental factors such as electrode shift and muscle fatigue, which traditionally undermine the robustness of gesture recognition systems. STCNet integrates a convolutional-recurrent architecture with a spatio-temporal block that extracts features over segmented time intervals, enhancing both spatial and temporal analysis. Additionally, a rolling convolution technique designed to reflect the circular band structure of the sEMG measurement device is incorporated, thus capturing the inherent spatial relationships more effectively. We further propose a subject-aware contrastive learning framework that utilizes both subject and gesture label information to align the representation of vector space. Our comprehensive experimental evaluations demonstrate the superiority of STCNet under aggregated conditions, achieving state-of-the-art performance on benchmark datasets and effectively managing the variability among different subjects. The implemented code can be found at https://github.com/KNU-BrainAI/STCNet.
Read full abstract