In this paper, the AM–FM modulation model is applied to speech analysis, synthesis and coding. The AM–FM model represents the speech signal as the sum of formant resonance signals each of which contains amplitude and frequency modulation. Multiband filtering and demodulation using the energy separation algorithm are the basic tools used for speech analysis. First, multiband demodulation analysis (MDA) is applied to the problem of fundamental frequency estimation using the average instantaneous frequency as estimates of pitch harmonics. The MDA pitch tracking algorithm is shown to produce smooth and accurate fundamental frequency contours. Next, the AM–FM modulation vocoder is introduced, which represents speech as the sum of resonance signals. A time-varying filterbank is used to extract the formant bands and then the energy separation algorithm is used to demodulate the resonance signals into the amplitude envelope and instantaneous frequency signals. Efficient modeling and coding (at 4.8–9.6 kbits/sec) algorithms are proposed for the amplitude envelope and instantaneous frequency of speech resonances. Finally, the perceptual importance of modulations in speech resonances is investigated and it is shown that amplitude modulation patterns are both speaker and phone dependent.
Read full abstract