THE “SPECIALNESS” OF SPEECH As is apparent from reading the first line of nearly any research or review article on speech, the task of perceiving speech sounds is complex and the ease with which humans acquire, produce and perceive these sounds is remarkable. Despite the growing appreciation for the complexity of the perception of music, speech perception remains the most amazing and poorly understood auditory (and, if we may be so bold, perceptual) accomplishments of humans. Over the years, there has been considerable debate on whether this achievement is the result of general perceptual/cognitive mechanisms or “special” processes dedicated to the mapping of speech acoustics to linguistic representations (for reviews see Trout, 2001; Diehl et al., 2004). The most familiar proposal of the “specialness” of speech perception is the various incarnations of the Motor Theory of speech proposed by Liberman et al. (1967; Liberman and Mattingly, 1985, 1989). Given the status of research into audition in the 1950s and 1960s, it is not surprising that speech appeared to require processing not available in “normal” hearing. Much of the work at the time used relatively simple tones and noises to get at the basic psychoacoustics underlying the perception of pitch and loudness (though some researchers like Harvey Fletcher were also working on some basics of speech perception, Fletcher and Galt, 1950; Allen, 1996). Liberman and his collaborators discovered that the discrimination of acoustic changes in speech sounds did not look like the psychoacoustic measures of discrimination for pitch and loudness. Instead of following a Weber or Fechner law, the discrimination function had a peak near the categorization boundary between contrasting phonemes—a pattern of perceptual results that is referred to as Categorical Perception (Liberman et al., 1957). In addition, the acoustic cues to phonemic identity were not readily apparent with similar spectral patterns resulting in different phonemic percepts and acoustically disparate patterns resulting in identical phonemic percepts—the problem of “lack of invariance” (e.g., Liberman et al., 1952). The perception of these varying acoustic patterns was highly context-sensitive to preceding and following phonetic content in ways that appeared specific to the communicative constraints of speech and not applicable to the perception of other sounds—as in demonstrations of perceptual compensation for coarticulation, speaking rate normalization and talker normalization (e.g., Ladefoged and Broadbent, 1957; Miller and Liberman, 1979; Mann, 1980). One major source of evidence in favor of a Motor Theory account of speech perception is that information about a speaker’s production (anatomy or kinematics) from non-auditory sources can affect phonetic perception. The famed McGurk effect (McGurk and MacDonald, 1976), in which visual presentation of a talker can alter the auditory phonetic percept, is taken as evidence that listeners are integrating information about production from this secondary source. Fowler and Deckle (1991) have demonstrated a similar effect using haptic information gathered by touching the speaker’s face (see also Sato et al., 2010). Gick and Derrick (2009) reported that perception of consonant— vowel tokens in noise are biased toward voiceless stops (e.g., /pa/) when they are accompanied by a small burst of air on the skin of the listener, which could be interpreted as the aspiration that would more likely accompany the release of a voiceless stop. In addition, there have been several studies that have demonstrated that manipulations of the listener’s articulators can affect perception, which are supportive of the Motor Theory proposal that the mechanisms of production underlie the perception of speech. For example, Ito et al. (2009) obtained shifts in phoneme categorization resulting from external manipulation of the skin around the listener’s mouth in ways that would correspond to the deformations typical of producing these speech sounds (see also Yeung and Werker, 2013 for a similar demonstration with infants). Recently, Mochida et al. (2013) found that the ability to categorize consonants can be influenced by the simultaneous silent production of these consonants. Typically, these studies are proffered as evidence for a direct role of speech motor processing in speech perception. Independent of this proposed motor basis of perception, others have suggested the existence of a special speech or phonetic mode of perception based on evidence of neural and behavioral responses to the same stimuli being modulated by whether or not the listener believes the signal to be speech or non-speech (e.g., Tomiak et al., 1987; Vroomen and Baart, 2009; Stekelenburg and Vroomen, 2012).
Read full abstract