Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis

Kanako Shirota,Yoshihiko Nankaku,Kazuhiro Nakamura,Keiichiro Oura,Kei Hashimoto,Keiichi Tokuda

doi:10.1109/icassp.2014.6854062

Kanako Shirota, Yoshihiko Nankaku + Show 4 more

https://doi.org/10.1109/icassp.2014.6854062

Copy DOI

Export

Save

Cite

Publication Date: May 1, 2014

Citations: 12

Affiliation: Nagoya Institute of Technology

Abstract
Full-Text
Similar Papers

Abstract

Listen

A statistical parametric approach to singing voice synthesis based on hidden Markov models (HMMs) has been growing in popularity over the last few years. The spectrum, excitation, vibrato, and duration of the singing voice in this approach are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. Since HMM-based singing voice synthesis systems are “corpus-based,” the HMMs corresponding to contextual factors that rarely appear in the training data cannot be well-trained. However, it may be difficult to prepare a large enough quantity of singing voice data sung by one singer. Furthermore, the pitch included in each song is imbalanced, and there is the vocal range of the singer. In this paper, we propose “singer adaptive training” which can solve the data sparse-ness problem. Experimental results demonstrated that the proposed technique improved the quality of the synthesized singing voices.

Full Text