Abstract
Gesture recognition based on surface electromyography (sEMG) has been widely used in the field of human-machine interaction (HMI). However, sEMG has limitations, such as low signal-to-noise ratio and insensitivity to fine finger movements, so we consider adding A-mode ultrasound (AUS) to enhance the recognition impact. To explore the influence of multisource sensing data on gesture recognition and better integrate the features of different modules. We proposed a multimodal multilevel converged attention network (MMCANet) model for multisource signals composed of sEMG and AUS. The proposed model extracts the hidden features of the AUS signal with a convolutional neural network (CNN). Meanwhile, a CNN-LSTM (long-short memory network) hybrid structure extracts some spatial-temporal features from the sEMG signal. Then, two types of CNN features from AUS and sEMG are spliced and transmitted to a transformer encoder to fuse the information and interact with sEMG features to produce hybrid features. Finally, the classification results are output employing fully connected layers. Attention mechanisms are used to adjust the weights of feature channels. We compared MMCANet's feature extraction and classification performance with that of manually extracted sEMG-AUS features using four traditional machine-learning (ML) algorithms. The recognition accuracy increased by at least 5.15%. In addition, we tried deep learning (DL) methods with CNN on single modals. The experimental results showed that the proposed model improved 14.31% and 3.80% over the CNN method with single sEMG and AUS, respectively. Compared with some state-of-the-art fusion techniques, our method also achieved better results.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have