When two people speak at the same time, it is easier to understand what either is saying if the voices differ in fundamental frequency (F0). Talkers differ in vocal tract size as well as F0, and their formant frequencies occupy different ranges, providing a further basis for voice segregation. To investigate the effects of vocal tract size, a set of declarative English sentences was produced by an adult male and processed using a speech vocoder. Sentences were presented in pairs, with the spectrum envelope shifted up or down by a fixed percentage in one of the sentences. A +20% shift (sufficient to shift the formants of an adult male into the female range) led to a 6% increase in word recognition accuracy. A −20% shift led to a 9% drop in accuracy, consistent with listeners’ informal reports that the ‘‘larger’’ voice sounded muffled. Benefits of upwards shifts and adverse effects of downward shifts were restricted to the ‘‘shifted’’ member of the pair. Upward spectral shifts were accompanied by increased spectral tilt, while downward spectral shifts led to reduced spectral tilt. Hence the observed effects may be due to spectral masking, rather than sensitivity to vocal tract size per se.