Abstract

A novel perceptual postfilter is introduced. For each frame, the filter gains, z, are estimated given a vector, y, of the quantized LSFs and the long-term prediction gain of the corresponding frame. The proposed perceptual postfilter is derived from an optimal MMSE estimator, i.e. the estimated gain vector isz = E{z|y}. The MMSE estimator is based on the conditional pdf of z given y, which is computed from the joint pdf modelled by a GMM. The proposed perceptual postfilter improves the speech naturalness comparing with the conventional adaptive postfilter, while maintaining the property of being an add-on postfilter without modification to the current encoder. Adaptive postfilters (1) have been widely applied in current Linear Prediction Analysis-by-Synthesis (LPAS) speech coders. Conventional postfiltering improves the decoded speech quality using the information available at the decoder, and is empiri- cally designed based on aspects of human perception. As re- search furthers in modelling of the human auditory system, bet- ter psychoacoustic models (2, 3) have been proposed and ap- plied in speech and audio processing, especially in audio cod- ing. However, only a few improvements (for instance, (4, 5)) have been made to adaptive postfilters despite our better under- standing of the human auditory system. A speech codec usually operates on a frame-by-frame basis. When we have access to the clean speech and its decoded ver- sion from a speech codec, a perceptual postfilter can be con- structed based on perceptual properties. The perceptual filter gains can be derived from each processing frame and applied to the decoded speech to improve the speech quality. However, in practice we do not have the information about the percep- tual postfilter gains at the decoder if they are not sent as side information. In this paper, we focus on the estimation of the perceptual postfilter gains without additional side information. Assume a given speech frame is coded by a LPAS speech coder, the decoder retrieves the quantized linear prediction (LP) coefficients. The LP coefficients represent the envelope of the short-time power spectrum which is very important for both the quality and intelligibility of coded speech. The perceptual post- filter gains are calculated for the corresponding frame. Since the open-loop prediction gain of the long-term prediction (LTP) in speech signals indicates the degree of voicing of the speech, we also calculate the LTP gain of this frame. We take the LP coefficients and the LTP gain as an input vector, and the per- ceptual postfilter gains as a target vector. A feature vector is constructed from input and target vectors. In order to find a Minimum Mean Square Error (MMSE) estimate of the target vector, a priori information of the joint probability density func-

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call