Synthesizing multistyle speech using the Klatt synthesizer

Daniel Lambert,Janet Rutledge,Kathleen Cummings,Mark Clements

doi:10.1121/1.408950

Abstract

Synthesizing multistyle speech has been an important topic of research in recent years. In this research, 11 commonly encountered speech styles have been synthesized by varying pitch, duration, intensity, and the glottal excitation on the Klatt Synthesizer 88, KLSYN88. These 11 speech styles include angry, clear, 50% tasking, 70% tasking, fast, Lombard, loud, normal, question, slow, and soft. All of the styles proved to be intelligible and appropriately styled based on subjective listening tests. The parameter variations of the glottal excitation are based on the results of statistical analyses [K. Cummings, Analysis, Syn., and Rec. of Stressed Speech, Ph.D thesis, Georgia Inst. of Tech., 1992] that demonstrated the importance of glottal excitation changes in styled speech. These statistical analyses demonstrated that the glottal excitation of each of the eleven styles is significantly and identifiably different. One utterance of the work ‘‘hot’’ was synthesized in the normal style. The other ten styles of this word were synthesized by changing only the glottal excitation parameters of the normal utterance. Subjective listening tests with untrained subjects demonstrated that the eleven synthetic styles were perceivably and identifiably different.

Full Text