Whisper is an indispensable way in speech communication, especially for private conversation or human-machine interaction in public places such as library and hospital. Whisper is unvoiced pronunciation, and voiceless sound is usually considered as noise-like signals. However, unvoiced sound has unique acoustic features and can carry enough information for effective communication. Although it is a significant form of communication, currently there is much less research work on whisper signal than common speech and voiced pronunciation. Our work extends the research of unvoiced pronunciation signal by introducing a novel signal feature, which is further applied in unvoiced signal modeling and whisper sound synthesis. The statistics of amplitude for each frequency component is studied individually, based on which a new feature of “consistent standard deviation coefficient” is revealed for the amplitude spectrum of unvoiced pronunciation. A synthesis method for unvoiced pronunciation is proposed based on the new feature, which is implemented by STFT with artificially generated short-time spectrum with random amplitude and phase. The synthesis results have identical quality of auditory perception as the original pronunciation, and have similar autocorrelation as that of the original signal, which proves the effectiveness of the proposed stochastic model of short-time spectrum for unvoiced pronunciation
Read full abstract