Abstract

Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measurement update rate of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17% and 88.87% for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.

Highlights

  • Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it

  • ­recognition[21], surface electromyography (sEMG) is currently the most developed measuring technique for Silent speech interfaces (SSI). sEMG requires the use of secondary articulatory signals as opposed to direct measurement techniques like US, electromagnetic articulography (EMA), permanent magnetic articulography (PMA), OPG or electro-optical stomatography (EOS), which capture the actual location of the tongue and lips in 2D or 3D space

  • We developed a custom acquisition hardware, capable of measuring the transmission spectra through the vocal tract with a measurement update rate of at least 100 Hz, which is usually set as the lower bound for real-time speech acquisition in SSIs

Read more

Summary

Introduction

Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. All of them have in common that they aim at restoring or enhancing oral communication from coexistent, non-audible (bio)signals which are generated during speech production, even in the absence of the acoustic speech signal itself Their potential applications range from voice restoration for patients who underwent l­aryngectomy[4,5] to enabling private conversations in public areas and enhancing speech intelligibility in noisy e­ nvironments[3]. For this purpose, a number of measuring modalities have been proposed that differ with respect to the type of biosignal they leverage, and whether these signals can be measured invasively or non-invasively. Several methods have been proposed to reduce this so called intersession variability ­substantially[35], but it remains an intrinsic difficulty of SSIs that use sensors which can vary in their placement

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call