Abstract

This study presents a hierarchical pitch conversion method using regression-based clustering for conversion function modeling. The pitch contour of a speech utterance is first extracted and decomposed into sentence-, word and sub-syllable-level features in a top-down mechanism. The pair-wise source and target pitch feature vectors at each level are then clustered to generate the pitch conversion function. Regression-based clustering, which clusters the feature vectors to achieve a minimum conversion error between the predicted and the real feature vectors is proposed for conversion function generation. A classification and regression tree (CART), incorporating linguistic, phonetic and source prosodic features, is adopted to select the most suitable function for pitch conversion. Several objective and subjective evaluations were conducted and the comparison results to the GMM-based methods for pitch conversion confirm the performance of the proposed regression-based clustering approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call