Voice pitch (F0) and vocal-tract length (VTL) are two principal voice characteristics that play a major role in segregating voices in cocktail party situations. Cochlear implant (CI) listeners struggle with taking advantage of voice differences in such situations. Recent studies show that they have difficulties discriminating voices on the basis of these two cues. Yet, the mechanisms underlying perception of these cues in CI users, but also in normal hearing (NH) listeners, remain largely unknown. In CIs, F0 can be coded temporally and spectrally in the lower frequency channels, but some recent studies have suggested that spectral centroid (SC) could be used instead. VTL could be perceived through its effect on individual formants, but is also often likened to timbre, and as such, it has been argued that VTL perception might rely on SC. However, these assumptions tend to overlook the SC variability occurring in natural speech. In this study, it is shown how this variability would influence F0 and VTL JNDs in NH and CI listeners if they were based on SC, using a signal detection theory model. The results suggest that SC is unlikely to be a reliable cue for vocal F0 and VTL perception in natural speech.