Abstract

Moment calculation is applied to extract the formant frequencies of a speech spectrum. Three kinds of first‐order moments divide a spectrum into four frequency regions. The centers of gravity of the first three regions are calculated to give the 0th order estimation of the 1st, 2nd, and 3rd formant frequencies. Then the upper and the lower bounds of each region are modified so that the estimated frequency comes closer to the major peak of the spectrum, utilizing the second‐order and the third‐order moments that represent the variance and skewness of the spectral pattern. The process repeats until the k th estimation equals the (k − 1) th estimation. This modification improves the estimation precision significantly. An experiment with model spectra generated by an all‐pole model gave estimation precision of 3% using formant frequencies typical of the five Japanese vowels. Speech materials spoken by five male and five female speakers were used for this experiment. The speech waveform was sampled at a rate of 10 kHz through a 5 kHz LPF, quantized into 12 bits; then the spectrum envelope was calculated with the first 24 cepstra of a 256‐point FFT spectrum. The results give acceptable precision, compared with visually determined formant frequencies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call