Abstract
We propose a corpus-based singing voice synthesis system combining the hidden Markov model (HMM) and the Deep Neural Network (DNN). Recently, in the area of text-to-speech synthesis, it was reported that the DNN-based speech synthesis method showed better speech quality than the HMM-based one. However, when we introduced the DNN to statistical singing voice synthesis, it did not improve the synthetic singing voice quality. Thus, we introduced the DNN in the singing voice synthesis in a different way. Instead of modeling the speech spectra, we exploited the DNN to model the difference between the spectra of natural singing voice and synthetic singing voice from the HMM. To do that, we used the DNN to map the input musical information such as lyrics, tones, durations into the difference of output acoustic features between the natural and synthetic singing voice. This allows us to reconstruct the spectral fine structures in singing voice generated by HMMs. Our results proved that the proposed method improved the quality of synthetic singing voice compared to the conventional methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.