A primary goal of auditory neuroscience is to understand how conspecific sounds are represented, learned, and mapped to constructs. Speech signals, conspecific to humans, are multidimensional, acoustically variable, and temporally ephemeral. A significant computational challenge in speech perception (and more broadly, audition) is categorization, that is, mapping continuous, multidimensional, and variable acoustic signals into discrete, behavioral equivalence classes. Despite the enormity of this computational challenge, native speech perception is rapid and automatic. In contrast, learning novel speech categories is effortful, and considered one of the most challenging categorization tasks for the mature brain. I will discuss three lines of ongoing research using multimodal neuroimaging, computational modeling, and behavioral training approaches: a) examine the multiple neural systems underlying successful L2 speech category learning in adulthood, b) assess sources of individual differences in L2 speech category learning, and c) design optimal, neurobiologically constrained training paradigms that reduce inter-individual differences in L2 speech category learning. These studies will provide insights into these fundamental questions: consistent with prior work on visual category learning, are multiple neural systems involved in speech categorization? What is the role of emerging expertise and individual differences in mediating neural and computational processes involved in speech category learning?