The resonant frequencies of the vocal tract during vowel production convey information about the linguistic vowel intended by the talker - whether they mean to say ‘hey’ or ‘hoe’, for example - while also conveying information about the talker. One particularly salient bit of talker information that partially determines the frequencies of the vowel formants is the length of the talker’s vocal tract. Vowel formant normalization aims to remove the effects of talker differences without also removing important linguistic information. This paper presents a study of vocal tract length normalization using a new ΔF method, and compares this method to other vowel normalization methods. A key point of comparison in this study is the number of vowel tokens that are needed in order to derive a stable estimate of vocal tract length. Several of the vowel normalization methods that are most commonly used in phonetic studies are shown to need a full set of vowels in order to be reliable, while methods that derive vocal tract length information from the full acoustic spectrum are much more stable and may even provide a length-normalized representation that could be cognitively computed and used in human speech perception.
Read full abstract