Speech analysis using a time‐varying ARX model for separating the source‐tract coupling of vowels

Tetsuo Funada

doi:10.1121/1.2026225

Abstract

The purpose of this research is to extract formant frequencies precisely and to classify voiced/unvoiced intervals accurately based on a source‐tract model. A sequential estimation of the source wave (i.e., the glottal volume flow) and the vocal tract (VT) characteristics is achieved by using a time‐varying “ARX model,” where the term ARX model refers to an AR (autoregressive) model with an auxiliary nonwhite input (X input). This X input indicates the glottal volume flow in the present research. Applications to synthetic vowels generated by the two‐mass model demonstrated the following results: (1) Much information on the glottal closure and opening was obtained from the X input; and (2) compared to the conventional (autocorrelation) LP method, formant frequencies (especially the first formant) during the open period of the glottis were estimated more accurately. It has also been observed from real vowels uttered by a male speaker that the phase of the X input agrees with the phase of the glottal movement which can be confirmed by electroglottography (EGG).

Full Text