Formant trajectories are excellent vowel discriminants; within vowel, they are nearly constant across speaker size, age, and sex, and across consonantal contexts. However, this model assumes that formant peaks are perceptually important and that human listeners track formant‐frequency changes across time. Speech‐recognition applications have avoided formant frequencies due to the difficulty of reliable formant tracking. In addition, it is not actually known whether human listeners do indeed follow formants perceptually across time. This paper presents results from several studies that examine the relationship between changing formant frequencies and perception. Alternative perceptual representations of vowels, such as global spectral shape, are precluded by evidence that individual formant amplitudes are largely ignored in vowel perception. In addition, where other spectral properties appear to have a perceptual effect, it is because stimuli have used formants that do not change. When formants are changing, perceptual effects of spectral shape properties disappear. In terms of human formant tracking, perceptual extrapolation of a formant sweep is mostly dependent on peak frequency and not other properties related to spectral shape. This demonstrates that listeners do indeed follow formant‐frequency changes as auditory objects. Further research on formant frequency perception will be described.
Read full abstract