Modeling speech production and perception mechanisms and their applications to synthesis, recognition, and coding

A Alwan

doi:10.1109/isspa.1999.818096

Abstract

Summary form only given, as follows. Quantitative models of human speech production and perception mechanisms provide important insights into our cognitive abilities and can lead to high-quality speech synthesis, robust automatic speech recognition and coding schemes, and better speech and hearing prostheses. Some of our research activities in these two areas are described. Our speech production work involved collecting, and analyzing magnetic resonance images (MRI), acoustic recordings, and electropalatography (EPG) data from talkers of American English during speech production. The articulatory database is the largest of its kind in the world and contains the first images of liquids (such as /I/ and /r/) and fricatives (such as /s/ and /sh) for both male and female talkers. MR images are useful for characterizing the 3D geometry of the vocal tract (VT) and for measuring lengths, area functions, and volumes. EPG is used to study inter- and intra-speaker variabilities in the articulatory dynamics, while acoustic recordings are necessary for modeling. Inter- and intra-speaker characteristics of the VT and tongue shapes will be illustrated for various speech sounds, as well as results of acoustic modeling based on the MRI and acoustic data. The implications of our findings on vocal-tract normalization schemes and speech synthesis are also discussed. In the speech perception area, aspects of auditory signal processing and speech perception are parameterized and implemented in a speech recognition system. Our models parameterize the sensitivity to spectral dynamics and local peak frequency positions in the speech signal. These cues remain robust when listening to speech in noise. Recognition evaluations using the dynamic model with a stochastic hidden Markov model (HMM) recognition system showed increased robustness to noise over other state-of-the-art representations. The applications of auditory modeling to speech coding are discussed. We developed an embedded and perceptually-based speech and audio coder. Perceptual metrics are used to ensure that encoding is optimized to the human listener and is based on calculating the signal-to-mask ratio in short-time frames of the input signal. An adaptive bit allocation scheme is employed and the subband energies are then quantized. The coder is variable-rate, noise-robust and suitable for wireless communications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling speech production and perception mechanisms and their applications to synthesis, recognition, and coding

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Teaching and Learning Guide for: Mirror Neurons, the Motor System, and Language – From the Motor Theory to Embodied Cognition and Beyond
Jonathan H Venezia ... Gregory Hickok
Language and Linguistics Compass | VOL. 4
Jonathan H Venezia, et. al.Jonathan H Venezia ... Gregory Hickok
01 Aug 2010
Language and Linguistics Compass | VOL. 4

Use of Music to Improve Speech Production in Children with Autism Spectrum Disorders: Theoretical Orientation
H A Lim
Music Therapy Perspectives | VOL. 27
H A LimH A Lim
01 Jan 2009
Music Therapy Perspectives | VOL. 27

A shared mechanism for perceiving and producing voice-onset times in speech
Peter C Gordon ... David E Meyer
The Journal of the Acoustical Society of America | VOL. 73
Peter C Gordon, et. al.Peter C Gordon ... David E Meyer
01 May 1983
The Journal of the Acoustical Society of America | VOL. 73

Children's Development of Self-Regulation in Speech Production
Ewen N Macdonald ... Jaime Forsythe
Current Biology | VOL. 22
Ewen N Macdonald, et. al.Ewen N Macdonald ... Jaime Forsythe
22 Dec 2011
Current Biology | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling speech production and perception mechanisms and their applications to synthesis, recognition, and coding

Abstract

Talk to us

Similar Papers