Abstract

This paper presents a novel framework for voice conversion based on sub‐syllable spectral block clustering transformation functions. The speech signal is first transferred to a spectrum by Fast Fourier transform. A sonority measure is used to extract sub‐syllable segments from input utterances by computing the energy concentration measure among frequency components. According to the syllable structure of Mandarin, Hidden Markov Model based syllable clustering is used to deal with the variety among different syllables. Dynamic programming is applied to align the spectral blocks of the parallel corpus to constrain the mapping between the spectral unit of the source speaker and that of the listener speaker under the constraint that mapped unities should be constrained to the same sub‐syllable and sub‐band in the Mel‐scale filter bank. A content based image retrieval algorithm is employed to find the target spectral block in the transformation phase. This paper illustrates voice conversion by spectral block transformation that transfers the speech signal of the source speaker to that of the listener. Experimental results show that the proposed method is effective in voice conversion, and the discrimination with regard to speaker identification is better than with traditional approaches. However, there remain additional noises, especially in high frequency components, which reduce the signal quality carried in the transformation phase, due to the fact that speech is not smooth.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call