Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network

Xi Chen,Xiang Chen,Xu Zhang,Xun Chen

doi:10.1109/tnsre.2023.3266299

Abstract

Finer-grained decoding at a phoneme or syllable level is a key technology for continuous recognition of silent speech based on surface electromyogram (sEMG). This paper aims at developing a novel syllable-level decoding method for continuous silent speech recognition (SSR) using spatio-temporal end-to-end neural network. In the proposed method, the high-density sEMG (HD-sEMG) was first converted into a series of feature images, and then a spatio-temporal end-to-end neural network was applied to extract discriminative feature representations and to achieve syllable-level decoding. The effectiveness of the proposed method was verified with HD-sEMG data recorded by four pieces of 64-channel electrode arrays placed over facial and laryngeal muscles of fifteen subjects subvocalizing 33 Chinese phrases consisting of 82 syllables. The proposed method outperformed the benchmark methods by achieving the highest phrase classification accuracy (97.17 ± 1.53%, ), and lower character error rate (3.11 ± 1.46%, ). This study provides a promising way of decoding sEMG towards SSR, which has great potential applications in instant communication and remote control.

Full Text