Abstract

In state-of-the-art text-to-speech (TTS) systems the state durations for each phoneme are generated so as to maximise the state sequence probability given the constraint that the sum of all state durations should be equal to the phoneme duration. Such maximisation sometimes results in negative state durations when the specified phoneme duration is less than the sum of the means of all the states of the phoneme. Such discrepancy implicitly results in the violation of the equality constraint. This has implications for speech research problems, in which each phoneme duration is specified. One such problem is the use of the TTS synthesis system for singing voice synthesis research. An algorithm for state duration assignment is derived so as to maximise the probability of the state sequence with the constraints that the sum of state durations should be equal to the total duration of the phoneme and all the state durations must be greater than or equal to 1. Experimental results show that the proposed algorithm always produces state durations greater than or equal to 1 while satisfying the equality constraint.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call