Abstract

A theory of voice production for vowels has to deal with two related problems; the problem of biomechanical modeling of vocal fold vibrations and the problem of calculating volume-velocity airflow through the glottis or the glottal airflow. This report is a tutorial on the second problem. We call this the aerodynamic and acoustic theory of voice production. Calculation of glottal airflow is difficult since it depends on an interaction between (1) the nonlinear time varying glottal impedance specified in the time domain and (2) the subglottal and vocal tract input impedances specified in the frequency domain. The effect of glottal geometry on the glottal impedance and the role of glottal impedance elements like kinetic resistance, viscous resistance and glottal inductance in determining glottal airflow are discussed. Methods to calculate vocal tract or subglottal input impedance based on a transmission line analog model and a formant network model are presented. Equations to find glottal airflow with source-filter interaction are derived. A digital pole-zero modeling of input impedance is proposed for an efficient and accurate computation of glottal airflow. The role of various factors in determining the so called residue, ripple and superposition components of glottal airflow is discussed with examples. The time domain response of a vowel is calculated using the glottal airflow with source-filter interaction. The instantaneous frequency and instantaneous bandwidth of an interactive vowel response are computed and interpreted. Further research is needed to extend the theory to the case of breathy vowels, vowel onsets, consonant to vowel and vowel to consonant transitions where the acoustic waves are superposed on a large dynamically changing mean airflow. A good understanding of the theory guides one in appropriate modeling and interpretation of voice source. The relevant features in voice source for a specific application such as forensic speaker identification can thus be identified. The author believes that habitually formed relative dynamic variations in voice source parameters are of greater significance in forensic speaker recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call