Abstract

In the present work, we propose a scheme for the fusion of different phone duration models, operating in parallel. Specifically, the predictions from a group of dissimilar and independent to each other individual duration models are fed to a machine learning algorithm, which reconciles and fuses the outputs of the individual models, yielding more precise phone duration predictions. The performance of the individual duration models and of the proposed fusion scheme is evaluated on the American-English KED TIMIT and on the Greek WCL-1 databases. On both databases, the SVR-based individual model demonstrates the lowest error rate. When compared to the second-best individual algorithm, a relative reduction of the mean absolute error (MAE) and the root mean square error (RMSE) by 5.5% and 3.7% on KED TIMIT, and 6.8% and 3.7% on WCL-1 is achieved. At the fusion stage, we evaluate the performance of 12 fusion techniques. The proposed fusion scheme, when implemented with SVR-based fusion, contributes to the improvement of the phone duration prediction accuracy over the one of the best individual model, by 1.9% and 2.0% in terms of relative reduction of the MAE and RMSE on KED TIMIT, and by 2.6% and 1.8% on the WCL-1 database.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.