High quality system for speech transformations

Stephanie Seneff

doi:10.1121/1.2017669

Abstract

A speech analysis synthesis system has been developed which is capable of independently modifying the excitation and the spectrum, and then reconstructing speech from the modified components. Since pitch extraction is not required, the reconstructed speech is more natural sounding than vocoded speech. At the core of the system is a phase vocoder. The excitation spectrum is obtained by dividing the magnitude spectrum by the vocal tract spectral model, and is then either duplicated or foreshortened to alter the number of harmonics present. Correct frequencies are restored by multiplying the unwrapped phase characteristics by the appropriate constant. The desired vocal tract shaping is then reintroduced, using interpolation between samples if necessary. The system can produce several potentially useful modifications to the voice. A male voice can be converted into a female‐like voice, and vice versa. Alternatively, the spectrum can be compressed by a factor of three or more, while the pitch remains unchanged; a modification potentially useful for persons with high frequency hearing loss. A third possibility is the restoration of helium speech. In addition, the parameters of the system could be encoded to realize a voice‐excited vocoder. A tape will be played illustrating the system's capabilities.

Full Text