Abstract

In previous work, a method to compensate the divergence between the distributions of natural and generated modulation spectra (MS) has been proposed for hidden Markov model (HMM) based speech synthesis. This method can alleviate the over-smoothing effect of parameter generation when mel-cepstral coefficients (MCC) are used as spectral features. This paper further investigates the MS compensation method for line spectral pairs (LSP). Four approaches to extract MS from LSPs are implemented and compared. These approaches calculate MS vectors using original LSP sequences, log power spectra (LPS) derived from LSPs, MCCs derived from LSPs, and MCCs derived from speech waveforms, respectively. Experimental results show that the naturalness of synthetic speech gets improved after MS compensation when LSPs are used as spectral features for HMM modeling. The degree of improvement depends on the type of spectral features for MS calculation significantly. MCCs derived from LSPs are more suitable for MS compensation than original LSPs and LPS derived from LSPs. Besides, using MCCs derived from speech waveforms also achieves satisfactory performance. This means that MS compensation can also be implemented as a post-filter to synthetic waveforms which does not rely on the type of spectral features and vocoders adopted in the synthesis system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call