Abstract
We propose a corpus-based singing voice synthesis system combining the hidden Markov model (HMM) and the Deep Neural Network (DNN). Recently, in the area of text-to-speech synthesis, it was reported that the DNN-based speech synthesis method showed better speech quality than the HMM-based one. However, when we introduced the DNN to statistical singing voice synthesis, it did not improve the synthetic singing voice quality. Thus, we introduced the DNN in the singing voice synthesis in a different way. Instead of modeling the speech spectra, we exploited the DNN to model the difference between the spectra of natural singing voice and synthetic singing voice from the HMM. To do that, we used the DNN to map the input musical information such as lyrics, tones, durations into the difference of output acoustic features between the natural and synthetic singing voice. This allows us to reconstruct the spectral fine structures in singing voice generated by HMMs. Our results proved that the proposed method improved the quality of synthetic singing voice compared to the conventional methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have