Decoding Silent Speech in Japanese from Single Trial EEGS: Preliminary Results

Yamaguchi H Yamazaki T,Yamamoto K Ueno S

doi:10.4172/jcsb.1000202

Yamaguchi H Yamazaki T, Yamamoto K Ueno S

Open Access

https://doi.org/10.4172/jcsb.1000202

Copy DOI

Journal: Journal of computer science and systems biology	Publication Date: Jan 1, 2015
Citations: 4	License type: cc-by

Abstract

We propose a new scheme for speaker-dependent silent speech recognition systems (SSRSs) using both single-trial electroencephalograms (EEGs) scalp-recorded and speech signals measured during overtly and covertly speaking “janken” and “season” in Japanese. This scheme consists of two phases. The learning phase specifies a Kalman filter using spectrograms of the speech signals and independent components (ICs), whose equivalent current dipole source localization (ECDL) solutions were located mainly at the Broca’s area, of the EEGs during the actual speech. In case of the “season” task, the speech signals were transformed into vowel and consonant sequences, and these relationships were learned by hidden Markov model (HMM) with Gaussian mixture densities. The decoding phase predicts spectrograms for the silent “janken” and “season” using the Kalman filter with the EEGs during the silent speech. For the silent “season”, the predicted spectrograms were inputted to the HMM, and which “season” was silently spoken was determined by the maximal log-likelihood among each HMM. Our preliminary results as training steps are as follows: the silent “jankens” were correctly discriminated; the silent “season”-HMMs worked well, suggesting that this scheme might be applied to the discrimination between all the pairs of the hiraganas

Highlights

The decipherment of human thought from brain activity, without recourse to speech or action, is one of the most attractive and challenging frontiers of modern science
We statistically examined the hypothesis in the Directions Into Velocities of Articulators (DIVA) model for our silent speech recognition systems (SSRSs)
With the minimal P-values, for all the tasks by all the subjects, between each formant frequency and the independent components (ICs), it followed that significant correlations were found for both F1 (r=0.62~0.88, p ≤ 6.00 × 10-15 for “rock”; r=0.64~0.94, p ≤ 7.64 × 10-13 “paper”; r=0.54~0.95, p ≤ 3.27 × 10-13 for “scissors”) and F2 (r=0.65~0.92, p ≤ 7.55 × 10-15 for “rock”; r=0.60~0.98, p ≤ 2.83 × 10-11 for “paper”; r=0.68~0.92, p ≤ 7.88 × 10-15 for “scissors”) except for one subject (DK) (Table 1), confirming the hypothesis

Summary

Introduction

The decipherment of human thought from brain activity, without recourse to speech or action, is one of the most attractive and challenging frontiers of modern science. In addition to “physical” SSRSs [2,3,4,5], in the “electrical” ones, articulation may be inferred from actuator muscle signals or predicted using command signals obtained directly from the brain. The latter could be speech prosthesis for individuals with severe communication impairments. We propose a new scheme for a speaker-dependent SSRS using singletrial scalp-recorded EEGs for silent vowel recognition, and generalize to consonant one in Japanese. In order to exemplify this scheme, we carried out two experiments (Experiments I and II)

Methods

Results

Conclusion