Optimized CNN‐Bi‐LSTM–Based BCI System for Imagined Speech Recognition Using FOA‐DWT

Meenakshi Bisla,Radhey Shyam Anand

doi:10.1155/2024/8742261

Abstract

Speech imagery is emerging as a significant neuro‐paradigm for designing an electroencephalography (EEG)‐based brain–computer interface (BCI) system for the purpose of rehabilitation, medical neurology, and to aid people with disabilities in interacting with their surroundings. Neural correlates of speech imagery EEG signals are variable and weak as compared to the vocal state; hence, it is challenging to interpret them using machine learning (ML)–based classifiers. The applicability of modern deep learning methods such as convolutional neural networks (CNNs) and bidirectional long short‐term memory (Bi‐LSTM) networks has seen substantial advances in complex EEG signal analysis as compared to ML‐based methods. The objective of this article is to design a firefly‐optimized discrete wavelet transform (DWT) and CNN‐Bi‐LSTM–based imagined speech recognition (ISR) system to interpret imagined speech EEG signals. This study utilizes two publicly available datasets. EEG signal is enhanced using firefly optimization algorithm (FOA)–based optimized soft thresholding of high‐frequency detail components obtained by DWT decomposition. The enhanced EEG signal is augmented using sliding window data augmentation to increase the training data. Frequency‐domain features like power spectral density (PSD), frequency band power (FBP), band ratios, peak frequency, mean frequency, median frequency, spectral entropy, and relative power are extracted from augmented EEG segments. The extracted feature vector is fed to the designed CNN‐Bi‐LSTM classifier such that the EEG data are classified into two‐class, three‐class, and four‐class categories. To achieve optimal performance, the CNN‐Bi‐LSTM model was optimized using the Keras tuner library. The designed CNN consists of one‐dimensional (1‐D) convolutional layers and max pooling layers for familiarizing local associations along with mining hierarchical connections, and the Bi‐LSTM network acquires long‐term dependencies from the features learned by the former CNN. Bi‐LSTM network improves the performance and acquires potentially more affluent representations by looking at the sequence in both forward and reverse ways to capture representations that might be left unexploited by the sequential‐order kind alone. The performance of the designed FOA‐DWT‐CNN‐Bi‐LSTM–based ISR system is assessed using four evaluation measures: accuracy, F1 score, recall, and precision. It is found that the proposed system achieves the highest classification accuracy of 99.43 ± 2.5%, 94.41 ± 3.31%, and 89.57 ± 4.3% for two‐class, three‐class, and four‐class categories, respectively.

Full Text