Abstract

A model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations.

Highlights

  • Speech production is often viewed as a process of planning and executing articulatory movements that generate an acoustic signal comprised of a temporally ordered stream of phonetic segments

  • Story and Bunton (2017) proposed a method, in part inspired by the distinctive region model of Mrayati et al (1988), in which an utterance is planned by specifying directional changes of the resonance frequencies relative to those of the underlying vocal tract configuration

  • The kinetic and potential energies, kinetic energy (Ke) and potential energy (Pe), for each resonance frequency are based on the pressure pjðiÞ and volume velocity ujðiÞ computed for each section of an area vector

Read more

Summary

INTRODUCTION

Speech production is often viewed as a process of planning and executing articulatory movements that generate an acoustic signal comprised of a temporally ordered stream of phonetic segments. Models of speech production are typically designed to emulate this process where the movements of the tongue, jaw, lips, velum, and larynx, or some lower dimensional representation of articulation, are orchestrated to collectively form the time-varying shape of the vocal tract, and transform the voice source into speech (cf., Mermelstein, 1973; Coker, 1976; Rubin et al, 1981; Maeda, 1990; Browman and Goldstein, 1992; Story, 2005, 2009, 2013; Toutios et al, 2011). When associated with a temporal “event” function, the specified resonance deflections are transformed, via calculations of acoustic sensitivity functions, into a time-varying modulation of the vocal tract. An advantage of this approach is that an explicit specification of vocal tract characteristics such as constriction location is not required. The model itself finds a time-dependent vocal tract deformation pattern, containing constrictions and synergistic expansions, that results in the specified acoustic goal

VOCAL TRACT MODEL CONTROLLED BY RELATIVE ACOUSTIC EVENTS
TRANSFORMATION OF RESONANCE DEFLECTION PATTERNS INTO VOCAL TRACT MODULATION
Sensitivity function calculation
Adjustments to the sensitivity functions
Calculation of the deformation function
Sequencing multiple acoustic events
SENTENCE-LEVEL SPEECH PRODUCTION
Sentence 1
Sentence 2
DISCUSSION AND CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call