Abstract
The accurate modelling of fundamental frequency, or F0, in HMM-based speech synthesis is a critical factor for achieving high quality speech. However, it is also difficult because F0 values are normally considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Namely, estimated F0 value is a discontinuous function of time, whose domain is partly continuous and partly discrete. This chapter investigates two statistical frameworks to deal with the discontinuity issue of F0. Discontinuous F0 modelling strictly defines probability of a random variable with discontinuous domain and model it directly. Awidely used approach within this framework is multi-space probability distribution (MSD). An alternative framework is continuous F0 modelling, where continuous F0 observations are assumed to always exist and voicing classification is modelled separately. Both theoretical and experimental comparisons of the two frameworks are given.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have