Abstract

We describe the improvements of a voice codec that has been issued as a draft for the US Federal Information Processing Standards-analog to digital conversion of voice by 2400 bit/second mixed excitation linear prediction (MELP) on June 12, 1997. In pitch estimation in MELP, a pitch doubling check algorithm and a strong voice pitch smoothing algorithm are applied. However, these algorithms are too simple to compute an accurate and smooth pitch period, and a leap of the pitch happens sometimes, especially during voice transition, about 5% to 10% of the pitch estimates are still not correct. In order to obtain a more accurate and smooth pitch period, a dynamic frame relative smoothing algorithm is applied to optimize the pitch period in MELP. After pitch smoothing almost all the errors are eliminated. In order to fit Chinese, we retrain the prediction parameters codebook for MELP using the simulated annealing algorithm based on a Chinese voice database. The Itakura distance test of distortion is applied, which shows the codebook obtained by the simulated annealing algorithm has less distortion than the codebook obtained by the traditional LBG algorithm. The probability of distortion of the former is 15% greater than the latter, for an Itakura distance between -0.1 and 0.1. The enhanced algorithm gives a better codebook for more fluent Chinese synthetic speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.