Abstract

HMM based Bengali speech synthesis system (Bengali-HTS) generates highly intelligible synthesized speech but its naturalness is not adequate even though it is trained with a very good amount of speech corpus. In case of interrogative, imperative and exclamatory sentences, naturalness of the synthesized speech falls drastically. This paper proposes a method to overcome this problem by modifying the $$\hbox {F}_{0}$$F0 contour of synthetic speech based on Fujisaki model. The Fujisaki model features for different types of Bengali sentences are analyzed for the generation of $$\hbox {F}_{0}$$F0 contour. These features depend on prosodic word/phrase boundary of the sentence. So a two layer supervised classification and regression tree is trained to predict the prosodic word/phrase boundary. Fujisaki model then generates $$\hbox {F}_{0}$$F0 contour from input text using the prosodic word/phrase boundary and segmental duration information from HMM-based speech synthesis system. Moreover, for HMM training purpose, prosodic structure of sentence has been employed rather than lexical structure. From MOS and preference test it is found that proposed method significantly improved the overall quality of synthesized speech than that of Bengali-HTS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call