Previous work has demonstrated that amplitude‐modulated time‐varying sinusoidal (AMTVS) replicas of natural sentences were more intelligible than simple time‐varying sinusoidal (TVS) sentences [T. D. Carrell, J. Acoust. Soc. Am. Suppl. 1 84, S158 (1988)]. More recent work indicated that this improvement was very likely based on the grouping effects of comodulation masking release [T. D. Carrell, J. Acoust. Soc. Am. Suppl. 1 86, S102 (1989)]. It was further proposed that amplitude modulation, which is often found in natural signals such as voiced speech, comodulates the components of a sound from a single source and may be used to help separate an auditory figure from its background in natural listening environments. The present experiment tested this claim directly. Eight TVS sentences and eight AMTVS sentences were presented in a background of multispeaker babble at a signal to noise ratio of + 6 dB. Listeners perceived only 36.9% of the phonemes in the unmodulated TVS sentences correctly, whereas they perceived 72.3% of the phonemes in the AMTVS sentences correctly. These results suggest that the amplitude comodulation of the components of a speech sound allow them to be grouped together and segregated from simultaneous sounds from other sources. This grouping and segregation is claimed to account for the improved intelligibility that was found.