Abstract

Voice impersonation or voice morphing is a technique used to modify the source voice into the desired target voice. In this the input signal undergoes a transformation process to produce the output which is perceived as if it is being spoken by the specified target speaker. The uses of voice impersonation are widespread, for example in dubbing character voice in films, voice restoration in old documents or movies, voice conversion, or to create a speech database for speech synthesis. There are a number of techniques devised over the years to perform voice morphing. One technique involves the conversion of voice based on articulatorymovement (AM) to vocal tract mapping (VTP). In this technique an artificial neural network is used to map AM to VTP and for the conversion of source voice to the target voice. The second technique applies a linear transformation approach to voice morphing. The source and target speaker’s voices are matched and linear transformations estimated from time-aligned parallel training data are applied to achieve the morphed voice. The third technique which we have implemented uses the concept of reconstruction; it aims to reconstruct the source voice by learning it, and then converting it into the desired target voice. It involves three stages of filter analysis, voice de-filtering and voice conversion. In this paper we will review these techniques in detail and provide a comparative study to display the efficiency of each technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call