Abstract

The objective of this paper is to critically evaluate the performance of a nonstationary analysis method in tracking speech formant frequencies as they change with time due to the natural variations in the vocal-tract system during speech production. The method of instantaneous frequency estimation is applied to the tracking of speech formant frequencies to observe the time variations in the vocal-tract system characteristics within a pitch period. An implementation of an instantaneous frequency estimator based on the source–filter model of speech production is described for voiced speech formants. Based on experimental results from simulated as well as natural speech data, it is shown that the accuracy of the frequency estimates is heavily dependent on the nature of the glottal excitation waveform, the fundamental frequency and the frequency spacing of the formants in the speech signal. The choice of various analysis parameters on the accuracy of the estimates is discussed. It is shown that only when the formants are well separated and there are distinct regions of the glottal cycle in which the source excitation can be considered to be negligible, does the instantaneous frequency estimate accurately represent the actual formant frequency. Experimental results on natural speech vowels which show differences in formant frequencies in the different phases of the glottal cycle are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call