Abstract

We propose a new approach to rapidly identifying adequate synthesis units in extremely large speech corpora. Our aim is to develop a concatenative speech synthesis system with high performance (both speech quality and throughput) for various practical applications. Utilizing very large speech corpora allows more natural sounding synthesized speech to be created; the downside is an increase in the time taken to locate the synthesis units needed. The key to overcoming this problem is introducing state-of-the art database retrieval technologies. The first selection step, based on simple hash search, tabulates all synthesis unit candidates. The second step selects N best candidates using nearest neighbor search, a typical database retrieval technique. Finally, the best sequence of synthesis units is determined by Viterbi search. A runtime measurement test and subjective experiment are carried out. Their results confirm that the proposed approach reduces the runtime by about 40% compared to using only hash search with no degradation in the quality of synthesized speech for a 15 hour corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.