A robust pattern recognition framework is required for ideal real-time human-machine interface (HMI) applications. Convolutional neural networks and recurrent neural networks have been widely used for the classification of gestures based on electromyography (EMG), but few studies have demonstrated the effectiveness of using a vision transformer for this purpose. Additionally, the accuracy achieved is influenced by the efficacy of the preprocessing pipeline. This study assessed ViT with and without an attention mechanism for precise motor intent decoding by investigating various input features and integrating convolutive blind source separation (BSS) preprocessing. All investigations were carried out with two open-access high-density surface EMG datasets of 34 and 21 hand gestures recorded from 20 and 5 healthy subjects respectively. Integration of centering and optimal extension factors resulted in better performance with raw input. However, spatial whitening increased the model's sensitivity to noise. The best-performing BSS-integrated convolution vision transformer model (BSS-CViT) model yielded an accuracy of 96.61% and 91.98% on test datasets one and two. This is a promising result for future studies in real-time HMI applications. The code implementation results reported in this study are available on GitHub. https://github.com/deremustapha/BSS-ViT.
Read full abstract