Abstract
In current hidden Markov model(HMM) based unit selection speech synthesis method, the optimal phone-sized candidate units are selected following the maximum likelihood(ML) criterion of the HMMs trained for various acoustic features. This paper introduces the statistical models for syllable-level F0 features into this method. Different from the frame-level F0 parameters used in the current framework, the pitch contour of the vowel in each syllable and its combination for adjacent syllables are extracted to represent the suprasegmental property of F0 features. A context-dependent statistical model is trained using these syllable-level F0 features and the likelihood function of this model is integrated into the unit selection criterion to evaluate the suprasegmental prosody of a given unit sequence. The conventional dynamic programming search algorithm for the phone-sized unit selection is modified to take into account the dependency between the candidate units for the vowels of adjacent syllables which is caused by the syllable-level F0 modeling. Our experiment results prove that this method can improve the naturalness of synthesized speech significantly.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.