Abstract

Speech is decomposed into three different components, based on the idea of Daudet and Torresani (Signal Processing, vol. 82, no. 11, pp. 1595, 2002), as signal = tonal + transient + residual. The tonal and transient components are identified using a small number of coefficients of the modified discrete cosine transform (MDCT) and the wavelet transform, respectively. Determinations of the significant MDCT and wavelet coefficients in the algorithm of Daudet and Torresani, referred as the D&T algorithm, are achieved by thresholds. All MDCT coefficients are assumed to be independent as well as wavelet coefficients. However, the MDCT coefficients probably have statistical dependencies, namely the clustering and persistence properties, and so do the wavelet coefficients. We propose a modification to the D&T algorithm, that can capture statistical dependencies by utilizing the hidden Markov model. The Viterbi and the maximum a posteriori (MAP) algorithms, used to find the optimal state distribution, are applied to determine the significant MDCT and wavelet coefficients automatically. The modified algorithm was used to encode 43 monosyllabic consonant-vowel-consonant (CVC) words and 3 sentences. Results showed that the modified algorithm improves the coding efficiency by 37% compared with the threshold method of D&T algorithm when equal numbers of significant coefficients are used.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.