Recognizing visual objects from single-trial electroencephalograph (EEG) signals is a promising brain-computer interface (BCI) technology. However, due to the redundant features from noisy multi-channel EEG signals, it is still a challenging task to achieve high precision recognition. Recent deep learning approaches commonly extract spatio-temporal features of EEG signals, which neglect important spectral-temporal features and may degrade the EEG recognition performance. To address the deficiency, we propose a novel channel attention weighting and multi-level adaptive spectral aggregation based dual-branch spatio-temporal-spectral Transformer feature fusion network (CAW-MASA-STST) for EEG-based visual recognition. Specially, we first develop a channel attention weighting (CAW) to automatically learn the channel weights of EEG signals. Then, a graph convolution-based multi-level adaptive spectral aggregation (MASA) is employed to aggregate spectral-temporal features of different sub-bands. Finally, a spatio-temporal-spectral Transformer (STST) is designed to fuse spatio-temporal and spectral-temporal features, which enhances the comprehensive learning ability by modeling the temporal dependencies of the fused features. Competitive experimental results on two public datasets demonstrate that the proposed method is able to achieve superior recognition performance compared with the state-of-the-art methods, indicating a feasible solution for visual recognition-based BCI technology. The code of our proposed method will be available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/ljbuaa/VisualDecoding</uri> .
Read full abstract