Neurological disorders affecting speech production adversely impact quality of
life for over 7 million individuals in the US. Traditional speech interfaces like eyetracking
devices and P300 spellers are slow and unnatural for these patients. An
alternative solution, speech Brain-Computer Interfaces (BCIs), directly decodes speech
characteristics, offering a more natural communication mechanism. This research
explores the feasibility of decoding speech features using non-invasive EEG. Nine
neurologically intact participants were equipped with a 63-channel EEG system
with additional sensors to eliminate eye artifacts. Participants read aloud sentences
displayed on a screen selected for phonetic similarity to the English language. Deep
learning models, including Convolutional Neural Networks and Recurrent Neural
Networks with and without attention modules, were optimized with a focus on
minimizing trainable parameters and utilizing small input window sizes for real-time
application. These models were employed for discrete and continuous speech decoding
tasks, achieving statistically significant participant-independent decoding performance
for discrete classes and continuous characteristics of the produced audio signal. A
frequency sub-band analysis highlighted the significance of certain frequency bands
(delta, theta, and gamma) for decoding performance, and a perturbation analysis
was used to identify crucial channels. Assessed channel selection methods did not
significantly improve performance, suggesting a distributed representation of speech
information encoded in the EEG signals. Leave-One-Out training demonstrated
the feasibility of utilizing common speech neural correlates, reducing data collection
requirements from individual participants.
Read full abstract