Abstract

A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.

Highlights

  • The ability to communicate through spoken language involves the generation of a wide array of sounds [1,2,3]

  • We describe methods for acquisition and analysis of high-resolution kinematic data from the diverse set of vocal tract articulators that is compatible with human electrophysiology

  • Acoustics are the best predictor of vowel category, with on average 88% correct classification, and classification based on the lips alone (24%), tongue alone (43%), and all kinematic features combined (52%), all performed significantly better than chance (11%) (Ã: P < 0.05, Wilcoxon sign-rank tests (WSRT), N = 6)

Read more

Summary

Introduction

The ability to communicate through spoken language involves the generation of a wide array of sounds [1,2,3]. Speech sounds are produced by the coordinated movements of the speech articulators, namely the lips, jaw, tongue, and larynx [4]. Chang is a New York Stem Cell Foundation— Robertson Investigator. S. Department of Energy under Contract No DEAC02-05CH11231. The author(s) received no specific funding for this work

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call