Abstract

This paper proposes a new approach of parameterizing the excitation signal for improving the quality of HMM-based speech synthesis system. The proposed method tries to model the excitation or residual signal by segregating the regions of the residual signal based on their perceptual importance. Initially, a study on the characteristics of the residual signal around glottal closure instant (GCI) is performed using principal component analysis (PCA). Based on the present study, and from the previous literature (Adiga and Prasanna in Proceedings of Interspeech, pp 1677–1681, 2013; Cabral in Proceedings of Interspeech, pp 1082–1086, 2013), it is concluded that the segment of the residual signal around GCI which carries perceptually important information is considered as the deterministic component and the remaining part of the residual signal is considered as the noise component. The deterministic component is compactly represented using PCA coefficients (with about 95% accuracy), and the noise component is parameterized in terms of spectral and amplitude envelopes. The proposed excitation modeling approach is incorporated in the HMM-based speech synthesis system. Subjective evaluation results show a significant improvement of quality for both female and male speakers’ speech synthesized by the proposed method, compared to three existing excitation modeling methods. Accurate parameterization of the segment of the residual signal around GCI resulted in the improvement of the quality of the synthesized speech. Synthesized speech samples of the proposed and existing source models are made available online at http://www.sit.iitkgp.ernet.in/~ksrao/parametric-hts/pcd-hts.html.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call