Abstract

Creating an intelligible as well as a natural text to speech synthesizer has been the ultimate goal of researchers for the past 30 years; and concatenative synthesis provides the most natural speech. It is usual to have distortions in the concatenation points in concatenative speech synthesis, and therefore generating audible clicks in the synthesized speech. To solve this problem, several signal processing concatenation algorithms exist, such as TDPSOLA, FDPSOLA, MBROLA etc. This paper addresses the problem of audible discontinuities at the concatenation points of diphones in Bengali speech synthesizer Subachan and solving it using TDPSOLA. In the process of doing this, we detected correct pitch mark locations of diphones, detected voiced and unvoiced speech frames of diphones and finally concatenated those diphones using TDPSOLA after rescaling them to remove energy mismatches. As a result, the audible clicks in the concatenation points are removed and speech with much better quality is generated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call