One characteristic of human speech perception is a remarkable ability to recognize speech when the speech signal is highly degraded. It has been argued that this ability to perceive highly degraded speech reflects speech-specific mechanisms. The present study tested this hypothesis by measuring the ability of chinchillas to recognize noise-vocoded (NV) versions of naturally spoken monosyllabic words using operant conditioning in a stimulus generalization paradigm. Chinchillas do not generalize the vocoded words to be perceptually equivalent to the naturally spoken words. The responses from chinchillas to the vocoded words fall well below their responses to the naturally spoken words. In this case, pitch cues rather than speech cues may be controlling the behavioral responses. To reduce pitch cues, chinchillas were retrained using 64-channel NV words. The responses from chinchillas to the vocoded test words were now similar to those of the 64-channel versions and were similar to those obtained from human listeners. However, responses obtained from chinchillas to time-reversed versions were high and similar to responses obtained to time-normal versions suggesting that the cue controlling behavioral responses was the phonetic structure of the words. These results show that chinchillas used different acoustic cues than human listeners. The ability of chinchillas to recognize NV words as being perceptually equivalent to the naturally spoken versions is inferior compared to that of human listeners. The findings suggest that the ability of human listeners to recognize highly degraded speech is unlikely to be based solely on the general auditory and perceptual mechanisms that are common among mammals. (PsycINFO Database Record (c) 2019 APA, all rights reserved).