Abstract

A fusion scheme of phone duration models (PDMs) is presented in this work. Specifically, a support vector regression (SVR)-fusion model is fed with the predictions of a group of independent PDMs operating in parallel. The American-English KED TIMIT and the Greek WCL-1 databases are used for evaluating the PDMs and the fusion scheme. The fusion scheme contributes to the accuracy improvement over the best individual model, achieving a relative reduction of the mean absolute error (MAE) and the root mean square error (RMSE), by 1.9% and 2.0% on KED TLVHT, and 2.6% and 1.8% respectively on WCL-1. Moreover, for evaluating the impact the accuracy improvement will have on synthetic speech, perceptual evaluation test was performed. This test showed that the accuracy improvement achieved by the SVR-fusion would contribute to the improvement of the naturalness of synthetic speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.