Abstract

Speech production is a hierarchical mechanism involving the synchronization of the brain and the oral articulators, where the intention of linguistic concepts is transformed into meaningful sounds. Individuals with locked-in syndrome (fully paralyzed but aware) lose their motor ability completely including articulation and even eyeball movement. The neural pathway may be the only option to resume a certain level of communication for these patients. Current brain-computer interfaces (BCIs) use patients' visual and attentional correlates to build communication, resulting in a slow communication rate (a few words per minute). Direct decoding of imagined speech from the neural signals (and then driving a speech synthesizer) has the potential for a higher communication rate. In this study, we investigated the decoding of five imagined and spoken phrases from single-trial, non-invasive magnetoencephalography (MEG) signals collected from eight adult subjects. Two machine learning algorithms were used. One was an artificial neural network (ANN) with statistical features as the baseline approach. The other was convolutional neural networks (CNNs) applied on the spatial, spectral and temporal features extracted from the MEG signals. Experimental results indicated the possibility to decode imagined and spoken phrases directly from neuromagnetic signals. CNNs were found to be highly effective with an average decoding accuracy of up to 93% for the imagined and 96% for the spoken phrases.

Highlights

  • Speech is an essential attribute of humans, with the execution of verbal communication being underpinned by a very complex—yet poorly understood—relationship between neural processing and articulation

  • The decoding performances during all the four stages obtained with artificial neural network (ANN) or convolutional neural networks (CNNs) were significantly higher than the chance level accuracy (30%) (1-tail t-test, p < 0.05 for all)

  • The differences between the decoding performances of ANN and CNN were statistically significant, which was observed via the pairwise comparison of the decoding performances with ANN, spectral-temporal CNN (ST-CNN), and SST-CNN (2-tail t-test, p < 0.05, for all possible pairs)

Read more

Summary

Introduction

Speech is an essential attribute of humans, with the execution of verbal communication being underpinned by a very complex—yet poorly understood—relationship between neural processing and articulation. EEG is a reasonable choice for brain-based communication for patients with debilitating neurodegenerative diseases, primarily because of its non-invasiveness, low cost, and satisfactory temporal resolution (Birbaumer, 2006). The major disadvantage of current EEG-BCIs is the slow word synthesis rate which is about a few words (< 10) per minute (Birbaumer, 2006). This is mostly due to the passive letter selection paradigm of the EEG-BCI designs where subjects are required to select control characters randomly displayed on a screen prompted with visual or attention correlates. Once the imaginedor intended-speech is generated internally, these signals are decoded to text or speech parameters, and a text-to-speech synthesizer (Cao et al, 2017) or a vocoder (Akbari et al, 2019) can be used to construct speech immediately

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call