Speech synthesis by rule based on VCV waveform synthesis units

Takao Koyama,Nubuo Koizumi

doi:10.1121/1.416345

Abstract

In recent years, several systematic speech synthesis methods in Japanese, that are based on the pitch synchronous waveform superposition method (PSOLA), have been proposed. Generally, the phonemic or the syllabic unit is used as the synthesis unit in these methods. Since these methods require the phonemic concatenation in the phoneme boundary position, the superposition of noise becomes a common problem. To solve the noise problem, a method is proposed based on VCV synthesis units. VCV synthesis units are concatenated in the vowel steady section. To improve the VCV method which does not consider using the unvoiced vowel originally, unvoiced vowels as the exceptional segments are regarded, and included the segments in the waveform database. When the VCV unit is divided into more primitive parts, it is useful to take the spectrum distortion into consideration when choosing segments. The method has been evaluated with 50 synthesized voices, and it was found that there is a correlation between evaluated score and spectrum distortions.

Full Text