Abstract

This paper investigates the effect of Modulation Spectrum (MS)-based preprocessing for Deep Neural Network (DNN)-based synthesis using Fast Fourier Transform (FFT) spectra. Preprocessing of of training data is an effective approach to improve an accuracy of acoustic model training. We have proposed the MS-based preprocessing called speech parameter trajectory for DNN-based synthesis using vocoder parameters and confirmed that the process improves training accuracy by removing components that are hard to be modeled with the acoustic models. On the other hand, DNN-based synthesis using FFT spectra was recently proposed, and it has a big potential for application wider than ever. This paper investigates whether the trajectory smoothing of the FFT spectra is effective or not. The experimental evaluation demonstrates that the trajectory smoothing reduces mean squared error between the predicted and target FFT spectra.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call