Abstract

Algorithms for encoding speech with good intelligence and naturalness at very low rates are studied. Naturalness is retained by encoding accurately the speech excitation information from an LPC (linear predictive coding) model. A glottal ARX (autoregressive with exogenous input) technique is used to model the speech signal for high quality. A large reduction in coding rate is achieved through short-term temporal compression of the speech and vector quantization. Application of traditional vector quantization to the temporal decomposition output is discussed, with consideration of distortion measures and codebook generation. Based on properties of short-term temporal decomposition, finite-state vector quantization is introduced to further decrease the coding rate. A problem associated with this technique, estimation of a state transition matrix with incomplete data, is treated. The general result is that practical coders operating in a range of 450-600 b/s with a delay of about 200 ms and natural-sounding output speech can be designed.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.