UC Berkeley Phonology Lab Annual Report (2013) The predominant pitch of semivowels 1 Greg Finley University of California, Berkeley A remarkable characteristic of speech perception is that it can be active when listening to nonspeech. Numerous studies have shown speech perception operating to some degree upon nonspeech using a wide variety of stimuli and conditions (Remez & Rubin 1993, Shannon et al. 1995, Liebenthal et al. 2003, Berent et al. 2010, Iverson et al. 2011, Finley 2012), demonstrating either that nonspeech is intelligible as speech or that speech and language abilities affect the processing of nonspeech sounds. Taking a perspective that considers speech perception a cognitive process that operates on auditory input, there is evidence that the computation of linguistic outputs can be performed on a range of inputs that is not restricted to speech (Berent et al. 2010). This suggests that the effects observed in these studies are all a reflection of the same cognitive module responsible for perceiving natural speech, and that this module can be further explored, and perhaps even reverse-engineered, by observing its behavior given various types of auditory input. One exemplary case of linguistic output from obviously nonspeech input is a phenomenon known as the predominant pitch (PP) of vowels. It has long been observed that pure tones of different pitches evoke different vowel qualities for listeners. An early thorough and systematic psychophysical evaluation of this phenomenon was by Farnsworth (1937), who polled subjects hearing tones ranging in frequency from 375 to 2400 Hz as to which of 13 English vowels best matched each tone. The most common three choices were /u/, /o/, and /i/, for which the respective median pitches were 500, 550, and 1900 Hz; the back /ɔ/, /ɑ/ had medians of 700 and 825 Hz and, taken together as for speakers without that contrast, constitute the fourth most common choice. Other studies (Fant 1973, Kuhl et al. 1991) vary in exact pitch ranges but confirm the trend of the back rounded vowels on the low end and the high front at the high end. The identified pitch actually corresponds fairly well to the second vowel formant, although this is difficult to prove outright given the variance of vowel categories between speakers and the rather vague, noncategorical nature of the PP effect. Given the importance of F2 in phonetic categorization (especially for English vowels, as the distribution of rounding serves only to exaggerate the range of possible F2 values), it might not be surprising for a very sparse auditory input to be automatically attributed to this cue. If this is the case, then PP may be the very transparent application of speech perception to nonspeech auditory input, reflecting the same processes that are employed in normal speech listening. A key difference between these stimuli and speech, however—beyond the obvious spectral differences—is in the temporal dynamics of the signal. Speech perception depends on processing both steady and time-varying spectra, but natural speech overwhelmingly comprises the latter. If PP does reflect the natural recruitment of speech perception, and not some other auditory effect, then it should operate also on stimuli with temporal modulation at rates similar to speech. Auditory pure tone stimuli used previously to gauge PP could fit the bill if modified to modulate in frequency over time. Of course this would also change the prediction of what speech sound such a tone would evoke: if a steady tone evokes a vowel, then the analogous speech 1 This work was presented as a talk at the 166 th meeting of the Acoustical Society of America in San Francisco, CA on December 3, 2013. The title that appears in the conference program is ‘Simple auditory elements induce perception of a phonetic feature’. This work was supported by NIH grant 1R01DC011295.
Read full abstract