Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation.

Reinier W L Kortekaas,Armin Kohlrausch

doi:10.1121/1.424588

Abstract

This article presents the results of listening experiments and psychoacoustical modeling aimed at evaluating the pitch synchronous overlap-and-add (PSOLA) technique. This technique can be used for simultaneous modification of pitch and duration of natural speech, using simple and efficient time-domain operations on the speech waveform. The first set of experiments tested the ability of subjects to discriminate double-formant stimuli, modified in fundamental frequency using PSOLA, from unmodified stimuli. Of the potential auditory discrimination cues induced by PSOLA, cues from the first formant were found to generally dominate discrimination performance. In the second set of experiments the influence of vocal perturbation, i.e., jitter and shimmer, on discriminability of PSOLA-modified single-formant stimuli was determined. The data show that discriminability deteriorates at most modestly in the presence of jitter and shimmer. With the exception of a few conditions, the trends in these data could be replicated by either using a modulation-discrimination or an intensity-discrimination model, dependent on the formant frequency. As a baseline experiment detection thresholds for jitter and shimmer were measured. Thresholds for jitter could be replicated by using either the modulation-discrimination or the intensity-discrimination model, dependent on the (mean) fundamental frequency of stimuli. The thresholds for shimmer could be accurately predicted for stimuli with a 250-Hz fundamental, but less accurately in the case of a 100-Hz fundamental.

Full Text