Abstract
This paper presents an approach, referred to as frequency domain interpolation (FDI), for achieving high-quality speech at low bit-rates (4 kb/s and below) within reasonable complexity and delay. FDI methods, like the prototype waveform interpolation (PWI) methods, derive a prototype waveform (PW) at regular intervals of time. But, unlike PWI, there is no separation into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW) component. Instead, the PW is encoded after gain normalization in magnitude-phase form. The magnitude is modeled as a sum of mean and deviation values in multiple frequency bands and this model is quantized using switched backward adaptive VQ techniques. The phase information is represented as a composite vector of PW correlations in multiple frequency bands and an overall voicing measure. This information is quantized using a VQ at the encoder. At the decoder, a phase model is employed that uses the received phase (and magnitude) information to reproduce PWs with the correct periodicity and evolutionary characteristics. Speech is synthesized by interpolating the reconstructed PWs after gain adjustment and filtering it using the short-term predictor and a postfilter. The design of a 4-kb/s and a 2.4-kb/s FDI codec are presented in this paper and their performance is characterized in terms of delay, complexity, and subjective voice quality. The results confirm that FDI techniques have the potential for delivering high-quality speech at low bit-rates in a cost-effective manner.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Audio, Speech and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.