Abstract

One of the most successful approaches to synthesizing speech, concatenative synthesis, combines recorded speech units to build full utterances. However, the prosody of the stored units is often not consistent with that of the target utterance and must be altered. Furthermore, several types of mismatch can occur at unit boundaries and must be smoothed. Thus, both pitch and time-scale modification techniques as well as smoothing algorithms play a crucial role in such concatenation based systems. In this paper, we describe novel approaches to each of these issues. First, we present a conceptually simple technique for pitch and time-scale modification of speech. Our method is based upon a harmonic coding of each speech frame, and operates entirely within the original sinusoidal model. Crucially, it makes no use of "pitch pulse onset times." Instead, phase coherence, and thus shape invariance, is ensured by exploiting the harmonic relation existing between the sine waves used to code each analysis frame so that their phases at each synthesis frame boundary are consistent with those derived during analysis. Secondly, a smoothing algorithm, aimed specifically at correcting phase mismatches at unit boundaries, is described. Results are presented showing our prosodic modification techniques to be highly suitable for use within a concatenative speech synthesizer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call