The perception of sound textures, a class of natural sounds defined by statistical sound structure such as fire, wind, and rain, has been proposed to arise through the integration of time-averaged summary statistics. Where and how the auditory system might encode these summary statistics to create internal representations of these stationary sounds, however, is unknown. Here, using natural textures and synthetic variants with reduced statistics, we show that summary statistics modulate the correlations between frequency organized neuron ensembles in the awake rabbit inferior colliculus (IC). These neural ensemble correlation statistics capture high-order sound structure and allow for accurate neural decoding in a single trial recognition task with evidence accumulation times approaching 1 s. In contrast, the average activity across the neural ensemble (neural spectrum) provides a fast (tens of milliseconds) and salient signal that contributes primarily to texture discrimination. Intriguingly, perceptual studies in human listeners reveal analogous trends: the sound spectrum is integrated quickly and serves as a salient discrimination cue while high-order sound statistics are integrated slowly and contribute substantially more toward recognition. The findings suggest statistical sound cues such as the sound spectrum and correlation structure are represented by distinct response statistics in auditory midbrain ensembles, and that these neural response statistics may have dissociable roles and time scales for the recognition and discrimination of natural sounds.