ABSTRACT Speech phoneme perception can be biased to follow production expectations. We propose some vowel-phonemes (such as /i/ as in “deed” and /Ʌ/ as in “dud”) sound, respectively, higher and lower in pitch in part because they are typically vocalized at higher and lower f 0 s, and listener pitch perception is biased toward those production tendencies. Resonant sounds, like vowels, possess two principal spectral pitch qualities: musical note/chroma (reflecting f 0 ), and tone height (reflecting spectral centroid). Participants could exhibit perceptual biases to experience vowel phonemes at differing tone heights related to their typically voiced f 0 s in speech and song, even when they are f 0 -matched. Experiment 1 measures the similarity of 12 f 0 -matched vowel sounds and reconfirms two principal spectral timbre dimensions, one which is related to pitch: tone height/brightness (spectral centroid), and the other, harmonicity/consonance (harmonic overtone alignment). Experiment 2 finds participants vocally mimic high- and low-frequency sinusoidal sounds using the phonemes at height extremes: 8 kHz elicits /i/ and 60 hz elicits /Ʌ/. Participants also mimic consonant and dissonant sounds using the phonemes at harmonicity extremes: 256 hz sinusoid elicits /u/ (“dude”) and an inharmonic cicada call elicits /æ/ (“dad”). Experiment 3 confirms that, in freeform speech and scat singing, /i/, /I/ (“did”), and /Ʌ/ vowel phonemes exhibit characteristic f 0 s that correlate with height ratings in Experiment 1. The overall findings confirm a natural regularity for vowel sounds to have systematically different f 0 s in speech and freeform song. Listeners incorporate this pattern as a perceptual bias when rating tone height of f 0 -matched vowel phonemes and in production when mimicking non-speech sounds, consistent with the scene-parsing principle of perceptual biases matching natural acoustic patterns.
Read full abstract