Abstract

This paper deals with the problem of pronunciation conversion (PC) task, a problem to reduce non-native accents in speech while preserving the original speaker identity. Although PC can be regarded as a special class of voice conversion (VC), a straightforward application of conventional VC methods to a PC task would not be successful since with VC the original speaker identity of input speech may also change. This problem is due to the fact that two functions, namely an accent conversion function and a speaker similarity conversion function, are entangled in an acoustic feature mapping function. This paper proposes dynamic frequency warping (DFW)-based spectral conversion to solve this problem. The proposed DFW-based PC converts the pronunciation of input speech by relocating the formants to the corresponding positions in which native speakers tend to locate their formants. We expect the speaker identity is preserved because other factors such as formant powers are kept unchanged. in a low frequency domain evaluation results confirmed that DFW-based PC with spectral residual modeling showed higher speaker similarity to original speaker while showing a comparable effect of reducing foreign accents to a conventional GMM-based VC method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.