Abstract

A statistical parametric approach to singing voice synthesis based on hidden Markov models (HMMs) has been growing in popularity over the last few years. The spectrum, excitation, vibrato, and duration of the singing voice in this approach are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. Since HMM-based singing voice synthesis systems are “corpus-based,” the HMMs corresponding to contextual factors that rarely appear in the training data cannot be well-trained. However, it may be difficult to prepare a large enough quantity of singing voice data sung by one singer. Furthermore, the pitch included in each song is imbalanced, and there is the vocal range of the singer. In this paper, we propose “singer adaptive training” which can solve the data sparse-ness problem. Experimental results demonstrated that the proposed technique improved the quality of the synthesized singing voices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call