Abstract

In data-driven corpus-based text-to-speech synthesis systems, the main issue is to select the most natural-sounding sequence of acoustic units without unnatural acoustic transitions, and to minimize all acoustic mismatches at the concatenation points. Unit selection algorithms incorporating unit selection cost functions have been known to synthesize speech close to natural quality. However, these algorithms operate over large acoustic inventories with huge number of acoustic units in a broad spectrum of linguistic, prosodic and acoustic contexts, and with a huge number of concatenation possibilities. Moreover, the shape of the unit selection cost function, which evaluates the cost of concatenating two subsequent acoustic units, is modelled manually in a time-consuming and laborious iterative process, which is based on subjective evaluation. Since this process must be repeated for any new acoustic inventory, or even after changes in a given acoustic inventory, we propose instead a new fuzzy unit selection cost function. We further propose to optimize fully automatically the shape of the fuzzy unit selection cost function to the given acoustic inventory’s context by using a relaxed gradient descent algorithm, where the subjective tests are replaced by a novel objective measure needed to evaluate unit selection cost function performance. Furthermore, the proposed approach is fully interpretable and also highlights insights into which parts of the fuzzy unit selection cost function’s shape could be further improved. The experiments show that the optimized fuzzy unit selection cost function significantly outperforms the baseline fuzzy unit selection cost function. Moreover, the results prove that the unit selection optimization algorithm is capable of finding the optimal shape of the fuzzy unit selection cost function, even when optimized over a small subset of sentences.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.