Currently, there are over 70 million people worldwide using more than 300 sign languages for communication, resulting in a vast number of sign language categories. Sign language recognition faces two main challenges. Firstly, in real-world applications, sign language users may not be represented in the dataset, leading to weak recognition capabilities of the models. Secondly, constructing large-scale sign language datasets is time-consuming and labor-intensive. Additionally, existing sign language recognition models have complex structures, leading to severe overfitting issues in the single-frequency dataset. To address these challenges, we construct a signer-independent learning sign language recognition method for the single-frequency dataset. In this work, a SwC GR-MMixer model, which relies on Gated Recurrent Unit (GRU) and Multi-Layer Perceptron (MLP) is developed, resulting in significantly reduced model complexity. After extensive ablation experiments, the most suitable structure for the SwC GR-MMixer model in the single-frequency dataset, as well as data augmentation methods, were determined. We achieved the best performance so far on the CSL-500 dataset and tackled the challenges of recognizing sign language, including the limitations imposed by the independence of the signer and the paucity of data on single-frequency datasets (e.g., CSL-500 dataset with only one demonstration per signer) by using a mask replacement method. Leveraging spatial feature extraction method, signer-independent sign language recognition tasks are accomplished, achieving a 6.95% improvement in the signer-independent method on the LSA64 dataset.
Read full abstract