Abstract

In the speech synthesis model presented in this paper, voiced speech is synthesized as the sum of sinusoidally modulated two FM sinusoids corresponding to the first and second formants. Each FM signal is generated such that its amplitude is equal to the formant amplitude, its carrier frequency to the formant frequency or its linear combination, its modulation frequency to the pitch, and its modulation index to one fifth of the carrier to modulation frequency ratio. Unvoiced speech is generated by shifting the center frequency of a low-pass noise with a bandwidth of 1 KHz, to the frequency where the energy of the unvoiced speech is concentrated. The drawbacks of this scheme are that the pitch and the formant frequencies of the FM signals may deviate up to 40% and 9%, respectively, and spurious formants may occur. A hardware implementation can be accomplished by driving a linear analog circuitry which can simply be integrated on a single chip, by a digital computer which supplies voltages at every T = 5 ms corresponding to seven parameter values. Examples of the signals and spectrograms of synthesized speech obtained by both synthesis by analysis and synthesis by rule are given along with a set of rules for text-to-speech synthesis of Turkish. It is observed that the speech synthesized by analysis loses the speaker's identity but it is highly intelligible, while understanding the speech synthesized by rules requires a training period.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.