Within species, vocal and auditory systems presumably coevolved to converge on a critical temporal acoustic structure that can be best produced and perceived. While dogs cannot produce articulated sounds, they respond to speech, raising the question as to whether this heterospecific receptive ability could be shaped by exposure to speech or remains bounded by their own sensorimotor capacity. Using acoustic analyses of dog vocalisations, we show that their main production rhythm is slower than the dominant (syllabic) speech rate, and that human-dog-directed speech falls halfway in between. Comparative exploration of neural (electroencephalography) and behavioural responses to speech reveals that comprehension in dogs relies on a slower speech rhythm tracking (delta) than humans' (theta), even though dogs are equally sensitive to speech content and prosody. Thus, the dog audio-motor tuning differs from humans', and we hypothesise that humans may adjust their speech rate to this shared temporal channel as means to improve communication efficacy.