The acoustic realization of speech sounds may differ substantially across talkers, such that (for example) a sound interpreted as “s” in one voice might be considered a “sh” in another. In this talk, I will review select neurobiological and computational data illustrating how listeners cope with this variability. I will also review behavioral data showing how talker variability can hinder processing, with slower and/or less accurate processing when the listening environment contains multiple voices compared to one voice. I argue that at least three mechanisms underlie these multi-talker processing costs. First, multi-talker processing costs may arise because talker changes disrupt a listener’s ability to attend to the target talker. Second, I make the novel claim that talker changes might impose processing penalties by inducing uncertainty about whether the extant acoustic-to-phonetic mapping is appropriate, motivated by recent data that when multi-talker processing costs are attenuated by a preceding carrier phrase, this manifests as slowed responses to single-talker speech rather than improved processing of mixed-talker speech. Finally, talker variability costs may result from making on-the-fly adjustments to the mapping between acoustics and phonetic categories. Overall, this talk highlights the ways in which listeners contend with and are hindered by talker variability.