Abstract

The purpose of this research is to extract formant frequencies precisely and to classify voiced/unvoiced intervals accurately based on a source‐tract model. A sequential estimation of the source wave (i.e., the glottal volume flow) and the vocal tract (VT) characteristics is achieved by using a time‐varying “ARX model,” where the term ARX model refers to an AR (autoregressive) model with an auxiliary nonwhite input (X input). This X input indicates the glottal volume flow in the present research. Applications to synthetic vowels generated by the two‐mass model demonstrated the following results: (1) Much information on the glottal closure and opening was obtained from the X input; and (2) compared to the conventional (autocorrelation) LP method, formant frequencies (especially the first formant) during the open period of the glottis were estimated more accurately. It has also been observed from real vowels uttered by a male speaker that the phase of the X input agrees with the phase of the glottal movement which can be confirmed by electroglottography (EGG).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call