The perceptual system for speech is highly organized from early infancy. This organization bootstraps young human learners' ability to acquire their native speech and language from speech input. Here, we review behavioral and neuroimaging evidence that perceptual systems beyond the auditory modality are also specialized for speech in infancy, and that motor and sensorimotor systems can influence speech perception even in infants too young to produce speech-like vocalizations. These investigations complement existing literature on infant vocal development and on the interplay between speech perception and production systems in adults. We conclude that a multimodal speech and language network is present before speech-like vocalizations emerge.