Speech synthesis from intracranial stereotactic Electroencephalography using a neural vocoder

Frigyes Viktor Arthur,Tamás Gábor Csapó

doi:10.36244/icj.2024.1.6

Abstract

Speech is one of the most important human biosig nals. However, only some speech production characteristics are fully understood, which are required for a successful speech based Brain-Computer Interface (BCI). A proper brain-to speech system that can generate the speech of full sentences intelligibly and naturally poses a great challenge. In our study, we used the SingleWordProduction-Dutch-iBIDS dataset, in which speech and intracranial stereotactic electroencephalography (sEEG) signals of the brain were recorded simultaneously during a single word production task. We apply deep neural networks (FC-DNN, 2D-CNN, and 3D-CNN) on the ten speakers’ data for sEEG-to-Mel spectrogram prediction. Next, we synthesize speech using the WaveGlow neural vocoder. Our objective and subjective evaluations have shown that the DNN based approaches with neural vocoder outperform the baseline linear regression model using Griffin-Lim. The synthesized samples resemble the original speech but are still not intelligible, and the results are clearly speaker dependent. In the long term, speech-based BCI applications might be useful for the speaking impaired or those having neurological disorders.

Full Text