Abstract

In a recent study of auditory evoked potential (AEP) based brain–computer interface (BCI), it was shown that, with an encoder–decoder framework, it is possible to translate human neural activity to speech (T-CAS). Current encoder–decoder-based methods achieve T-CAS often with a two-step approach where the information is passed between the encoder and decoder with a shared vector of reduced dimension, which, however, may result in information loss. In this paper, we propose an end-to-end model to translate human neural activity to speech (ET-CAS) by introducing a dual–dual generative adversarial network (Dual-DualGAN) for cross-domain mapping between electroencephalogram (EEG) and speech signals. In this model, we bridge the EEG and speech signals by introducing transition signals which are obtained by cascading the corresponding EEG and speech signals in a certain proportion. We then learn the mappings between the speech/EEG signals and the transition signals. We also develop a new EEG dataset where the attention of the participants is detected before the EEG signals are recorded to ensure that the participants have good attention in listening to speech utterances. The proposed method can translate word-length and sentence-length sequences of neural activity to speech. Experimental results show that the proposed method significantly outperforms state-of-the-art methods on both words and sentences of auditory stimulus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call