This is a response to the commentary on Vouloumanos and Werker (2007) by Rosen and Iverson (2007). Are humans born with a bias for listening to the vocal-izations of their species? In Vouloumanos and Werker(2007, this issue), we present data demonstrating thatfrom birth, the human infant prefers listening to speech,compared with non-speech sounds that mimic spectraland temporal properties of speech. Rosen and Iverson(2007, this issue) criticize this interpretation, first argu-ing that the preference we have shown is based on voicemelody rather than speech per se; second, they arguethat such a voice melody preference likely stems fromprenatal learning, rather than from an innate bias – aclaim we didn’t make in Vouloumanos and Werker, butthat is addressed by new data we present here.Turning to Rosen and Iverson’s first point, althoughwe agree that a voice melody account of newborns’preference for speech is not altogether impossible, wefind it implausible. Voice melody, or pitch, is the subjectivehighness or lowness of a sound as perceived by thehuman ear. Although pitch extraction is not fully under-stood (e.g. Patel & Balaban, 2001), in natural speech,pitch is generally perceived as the fundamental fre-quency (F0) of an utterance, which is the frequency withwhich a particular speaker’s vocal folds vibrate (typicallyaround 200 Hz for a woman’s voice). Because of reson-ance properties of sound, F0 is reflected in ‘harmonics’at integer multiples of F0 (e.g. an F0 of 150Hz (itself thefirst harmonic), will have harmonics at 300 Hz, 450 Hz,600 Hz, etc.), which contribute to the perception of pitchif, for example, F0 is missing. Research on infant pitchperception is limited, but has shown that 7-month-oldinfants demonstrate some adult-like characteristics intheir perception of pitch (Montgomery & Clarkson, 1997).Even at this age, however, there is considerable variationin individual infants’ abilities to recover pitch when F0is missing (Clarkson, 1992). Pitch extraction in youngerinfants is currently poorly understood but is believed todiffer from adult pitch perception (Bundy, Colombo S Clarkson, 1992). Though neonates aresensitive to pitch contours, discriminating, for example,high-low pitch from low-high pitch in bimoraic stimuli(Nazzi, Floccia & Bertoncini, 1998), the mechanism ofpitch extraction in neonates has not been investigated.To examine neonates’ preference for speech, the non-speech sounds we used were a variant on sine-waveanalogues (SWA) of speech (Remez, Rubin, Pisoni &Carrell, 1981). SWA consist of time-varying sinusoidalwaves, or sine waves, that track the centre frequencies ofthe energy bands (formants) of natural speech to reproducethe changes in these frequency peaks across time. SWAare typically composed of three sinusoidal waves thatreproduce the changes in the first three formants of speechso adroitly that under the right circumstances, adultlisteners perceive SWA as intelligible (if weird) speech(Remez et al., 1981). At stake here is which componentof our SWA conveyed the perception of pitch. Rosenand Iverson suggest that because the first formant (F1)is usually heard as conveying pitch in SWA, even withthe addition of F0 (Remez & Rubin, 1984), F1 is likelyto convey perceived pitch in our stimuli as well, andthus, the voice melody perceived in our SWA is less salientcompared to that in the speech set. We would arguethat the F0 component in our SWA was salient, and thatit, rather than F1, accounted for the perceived pitch. Thekey lies in the construction of the stimuli by SonyaBird and Guy Carden, of the University of Victoria, andthe University of British Columbia, respectively. Whilecreating the SWA, they found that the first three formants(F1, F2, and F3) were virtually identical across themultiple natural speech tokens. For this reason, theyselected one representative set of the first three formants