Abstract

This paper proposes a simple and efficient technique for variance compensation to improve the perceptual quality of synthetic speech in parametric speech synthesis. First, we analyze the problem of spectral and F0 enhancement with global variance (GV) in HMM-based speech synthesis. In the conventional GV-based parameter generation, the enhancement is achieved by taking account of a GV probability density function with fixed GV model parameters for every output utterance through the speech parameter generation process. We find that the use of fixed GV parameters results in much smaller variations of GVs in synthesized utterances than those in natural speech. In addition, the computational cost is high because of iterative optimization. This paper examines these issues in terms of multiple objective measures such as variance characteristics, GV distortions, and GV correlations. We propose a simple and fast compensation method based on a global affine transformation that provides a GV distribution closer to that of natural speech and improves the correlation of GVs between natural and generated parameter sequences. The experimental results demonstrate that the proposed variance compensation methods outperform the conventional GV-based parameter generation in terms of objective and subjective speech similarity to natural speech while maintaining speech naturalness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call