Neural entrainment, the alignment between neural oscillations and rhythmic stimulation, is omnipresent in current theories of speech processing – nevertheless, the underlying neural mechanisms are still largely unknown. Here, we hypothesized that laminar recordings in non-human primates provide us with important insight into these mechanisms, in particular with respect to processing in cortical layers. We presented one monkey with human everyday speech sounds and recorded neural (as current-source density, CSD) oscillations in primary auditory cortex (A1). We observed that the high-excitability phase of neural oscillations was only aligned with those spectral components of speech the recording site was tuned to; the opposite, low-excitability phase was aligned with other spectral components. As low- and high-frequency components in speech alternate, this finding might reflect a particularly efficient way of stimulus processing that includes the preparation of the relevant neuronal populations to the upcoming input. Moreover, presenting speech/noise sounds without systematic fluctuations in amplitude and spectral content and their time-reversed versions, we found significant entrainment in all conditions and cortical layers. When compared with everyday speech, the entrainment in the speech/noise conditions was characterized by a change in the phase relation between neural signal and stimulus and the low-frequency neural phase was dominantly coupled to activity in a lower gamma-band. These results show that neural entrainment in response to speech without slow fluctuations in spectral energy includes a process with specific characteristics that is presumably preserved across species.