Spectral and pitch modeling with hybrid approach to singing voice synthesis using hidden semi-Markov model and deep neural network

Kouki Hongo,Takashi Nose,Akinori Ito

doi:10.1121/1.4969155

Kouki Hongo, Takashi Nose + Show 1 more

https://doi.org/10.1121/1.4969155

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

We propose a corpus-based singing voice synthesis system combining the hidden Markov model (HMM) and the Deep Neural Network (DNN). Recently, in the area of text-to-speech synthesis, it was reported that the DNN-based speech synthesis method showed better speech quality than the HMM-based one. However, when we introduced the DNN to statistical singing voice synthesis, it did not improve the synthetic singing voice quality. Thus, we introduced the DNN in the singing voice synthesis in a different way. Instead of modeling the speech spectra, we exploited the DNN to model the difference between the spectra of natural singing voice and synthetic singing voice from the HMM. To do that, we used the DNN to map the input musical information such as lyrics, tones, durations into the difference of output acoustic features between the natural and synthetic singing voice. This allows us to reconstruct the spectral fine structures in singing voice generated by HMMs. Our results proved that the proposed method improved the quality of synthetic singing voice compared to the conventional methods.

Full Text