Minimum generation error training by using original spectrum as reference for log spectral distortion measure

Yi-Jian Wu,Keiichi Tokuda

doi:10.1109/icassp.2009.4960508

Abstract

This paper improves a minimum generation error (MGE) based HMM training technique for HMM-based speech synthesis by directly using the original spectrum instead of line spectral pairs (LSPs) as reference spectrum for log spectral distortion (LSD) measure. Two types of original reference spectra for LSD calculation are investigated, including the spectrum extracted from speech waveform by STRAIGHT, and the short-time FFT spectrum calculated from speech waveforms. Since only the harmonics of the FFT spectrum are coincident with the underlying spectral envelope, the LSD between generated LSPs and original FFT spectrum is calculated by sampling at the harmonic frequencies, and a weighting function is designed to simulate the sampling strategy on LSPs. From the experimental results, the MGE-LSD training using the FFT spectrum as reference spectrum achieved the best performance.

Full Text