Abstract

This paper describes new techniques for modeling and generating speaker-dependent pitch contours for sentences. Speech synthesis applications could generally benefit from such speaker-specific pitch contours. The proposed algorithms begin with an existing pitch contour for an utterance and use data from training utterances to modify the contour to be appropriate for a second speaker. One approach modifies the original pitch values to statistically match the desired speaker at each point in time. A second novel approach uses dynamic time warping (DTW) to select a new pitch contour from a pre-determined code book and time-align the chosen contour to the original sentence. Such contour mapping can transfer one speaker's natural pitch characteristics to another person's speech. Informal listener evaluations suggest that while shifting the frequency range of the original pitch contour yields some improvement, better results are obtained by applying DTW techniques to time-warp the contour from an existing sentence produced by the desired speaker.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call