Abstract

Recently, concatenative speech synthesizers with large databases have been widely developed for high-quality speech synthesis. However, some platforms require a speech synthesis system that can work under the limitation of memory footprint or computational cost. In this paper, we propose a scalable concatenative speech synthesizer based on the plural speech unit selection and fusion method. To realize scalability, we propose the offline unit fusion method in which pitch-cycle waveforms for voiced segments are fused in advance. The experimental results show that the synthetic speech of the offline unit fusion method with half-size waveform database is comparable to that of the online unit fusion method, while the computation cost is reduced to 1/10.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call