Sign language recognition (SLR) is an effective solution to communication barriers experienced by hearing and vocally impaired individuals with other communities. Its applications extend to human–robot interaction (HRI), virtual reality (VR), and augmented reality (AR). However, the diverse nature of sign languages, stemming from varying user habits and geographical regions, poses significant challenges. To address these challenges, we propose the Skeleton-based Multi-feature Learning method (SML). This method comprises a Multi Feature Aggregation (MFA) module, designed to capture the inherent relationships between different skeleton-based features, enabling effective fusion of complementary information. Furthermore, we propose the Self knowledge distillation Guided Adaptive Residual Decoupled Graph Convolutional Network (SGAR-DGCN) for feature extraction. SGAR-DGCN consists of three components: a Self Knowledge Distillation (SKD) mechanism to enhance model training, convergence, and accuracy; a DGCN-Block, incorporating Decoupled GCN and Spatio Temporal Channel attention (STC) for efficient feature extraction; and an Adaptive Residual Block (ARes-Block) for cross-layer information fusion. Experimental results demonstrate that our SML method outperforms state-of-the-art approaches on the WLASL (55.85%) and AUTSL (96.85%) datasets, solely utilizing skeleton data. Code is available at https://github.com/DzwFine37/SML.
Read full abstract