Abstract

The time domain pitch synchronous overlap and add (TD-PSOLA) is the technique most used in comercial concatenative text-to-speech (TTS) synthesis systems. However, it is well known that TD-PSOLA presents several drawbacks. In order to overcome some drawbacks of the TD-PSOLA, this work presents a method based on time frequency interpolation (TFI) [Yair Shoham]. The method introduced here is a pitch-synchronous time-frequency approach of the waveform interpolation technique (WI) [Bastian Kleijn]. The goal of this work is to show that the TFI technique presents some important advantages to concatenative TTS synthesis. It allows pitch scale modification (PSM) independent of time scale modification (TSM) in a quite straightforward manner, and with high quality. TSM and PSM can be done in a continuous way, without any limitation of pitch period resolution. Moreover, the TFI technique allows simple, flexible, and efficient procedures to smooth diphone (or any other kind of unit) boundaries. The proposed system was evaluated using diphones and prosodies generated by the Festival system [Alan Black, Paul Taylor]. Subjective tests were performed, between the proposed TFI system and the standard TD-PSOLA system, highlighting the superior quality of the proposed system in comparison with TD-PSOLA.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.