Abstract

Speech analysis and synthesis by linear prediction is based on the assumption that the short-time spectral envelope of speech can be represented by a number of poles. An all-pole representation does not provide an accurate description of speech spectra, particularly for nasal and nasalized sounds. In this paper, a method is presented for characterizing speech in terms of the parameters of a pole-zero model. The basic approach used in the method is similar to one used for the all-pole case: the parameters of the pole-zero model are obtained by minimizing the mean-squared prediction error over the analysis interval. It is shown that, although the equations resulting from the minimization are nonlinear, both poles and zeros can be determined without using an iteractive procedure. The method has been applied to real speech data and the results show that the speech spectra derived from the pole-zero model agree very closely with the actual spectra derived by direct FFT analysis. Audible distortion due to inadequate treatment of zeros in the all-pole case disappear with pole-zero representation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call