Abstract

A state-of-the-art speech parameter conversion technique and its application to a mapping between features of different modalities are reviewed. Many statistical approaches to the parameter conversion have been studied particularly for voice conversion in speech synthesis research. A typical method conducts the parameter conversion frame by frame based on the minimum mean square error using a Gaussian mixture model of the joint probability density of input and output parameters [Y. Stylianou et al., IEEE Trans. SAP, 6(2), 131-142 (1998)]. Although this method is reasonably effective, the deterioration of the conversion accuracy is caused by essential problems of the frame-based conversion process. Recently a conversion method based on the maximum likelihood estimation of a parameter trajectory has been proposed [T. Toda et al., IEEE Trans. ASLP, 15(8), 2222-2235 (2007)]. This method realizes the appropriate converted parameter sequence by (1) using not only static but also dynamic feature statistics and (2) considering a global variance feature of the converted parameters. It has been reported that this method is effective in several applications such as a spectral determination from articulatory movements, an acoustic-to-articulatory inversion mapping, and a conversion of body-transmitted speech into air-transmitted speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.