Abstract

Development of text-independent Voice Conversion (VC) has gained more research interest for last one decade. Alignment of the source and target speakers' spectral features before learning the mapping function is the challenging step for the development of the text-independent VC as both the speakers have uttered different utterances from the same or different languages. State-of-the-art alignment technique is an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) algorithm that iteratively learns the mapping function after getting the nearest neighbor aligned feature pairs from intermediate converted spectral features and target spectral features. To the best of authors' knowledge, this algorithm was shown to converge empirically, however, its theoretical proof has not been discussed in detail in the VC literature. In this paper, we have presented that the INCA algorithm will converge monotonically to a local minimum in mean square error (MSE) sense. In addition, we also present the reason of convergence in MSE sense in the context of VC task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.