A control program for a Glace-Holmes speech synthesizer

Yoshiyuki Horii

doi:10.3758/bf03201074

Abstract

Synthetic speech provides unique opportunities to investigate important acoustic cues in speech signals and the natur~ of speech perception in general. The program, SYNG, descnbed here is a utility program which controls a parallel-resonance terminal-analog speech synthesizer JAWORD (Glace, 1968). The synthesizer requires up to 11 inputs, 1 of which determines types of excitation (noise source, pulse source, or combination of the two), 1, fundamental frequency, and the remaining 9, the formant frequencies and their amplitudes. Synthesizer control by computer essentially involves (1) preparation of sets of 11 parameter values, each set representing one time segment. of speech, and (2) sending these sets of values to the synthesizer through digital-to-analog converters under programmed timing control. Input/Output. Input data, from a card reader, Teletype,. or magnetic tape unit, specify each set of 11 values corresponding to a 10-msec sample of speech. An array of parameters for 600 time samples, which is equivalent to 6 sec of speech, can be accommodated at one time. With the use of disk storage, 20 such parameter arrays can be stored and synthesized in any order, extending the system capacity to handl~ up to 120 sec of ~peech. An option is also available to synthesize SOUGHS (Society of Users of Glace-Holmes Synthesizer) data punched on cards. When parameter values of some time segments are not available, the program provides either linear or exponential interpolation to generate the missing information in a manner similar to that described by Rabiner (1968). The program then converts the successive sets of parameter values into equivalent voltages that are transferred to the synthesizer. The transfer rate can be specified at the time of execution, so that the synthesized speech can be compressed or stretched in time. A silent interval between the repetitions is also specifiable. On-going synthesis halts upon detection of an interrupt given manually by pressing a control button at which time the user is given the several procedural options, For example, he can listen to any portion of the entire utterance at any desired transfer rate, he can obtain a printout of arrays of parameter values for any portion, or he can edit one or more parameter values for any segment. The edited portion can be synthesized alone or in the context of the entire utterance, or its parameter arrays can be written out and/or remodified. If desired, parameter values can be displayed as a function of time on an oscilloscope (one parameter) or on a graphic level recorder (up to eight parameters). The program is written in an interactive form so that instructions to the user are typed out on the Teletype preceding each action required, and messages are given when illegal actions are taken. In contrast to synthesis by rule, in which input data typically specify a sequence of phonetic elements and accent and intonation codes (Mattingly, 1968), the present program utilizes parametric descriptions of utterances supplied by a primary recognition program (Hughes et al, 1969) which includes formant tracking and fundamental frequency tracking algorithms (Snow &.Hughes, 1969). Program Language and Computer. The program is written in FORTRAN IV for a CDC 1700 computer with 16 channels of digital-to-analog conversion capability. Availability. The listing of the program is available free on request from the author. REFERENCES

Full Text