Abstract
Composite source (CS) models for speech signals consist of a finite number of subsources describing specific spectral characteristics and a switch process representing the variation of these characteristics. They allow the computation of rate distortion functions (RDFs) which are significantly below those for simple stochastic models. In this paper, it is discussed how to estimate a switch state sequence such that it is optimal in the computation of an RDF. Using a maximum-likelihood approach, a new relationship between Itakura-Saito clustering and rate distortion theory for composite sources is derived. The performance bounds are based on the mean squared error distortion measure. It will be shown that for a given number of subsources the resulting composite source model has the lowest possible RDF in the range of small values of distortion. In addition, a lower bound for the RDF for a composite source having an arbitrarily high number of subsources is derived. Finally, it is shown that rate distortion theory for composite sources provides explicit bit-allocation rules for a class of blockwise adaptive (switched) waveform coding schemes with time-varying transmission rate.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.