Previous studies have struggled to identify measures beyond the audiogram to reliably predict speech-in-noise scores. This may owe to: (i) different mechanisms mediate performance depending on materials and task; and (ii) effects are not reproducible. Here, 38 listeners with normal/near-normal audiograms completed batteries of temporal auditory and cognitive tests, and speech recognition (“Theo-Victor-Michael” test) in speech-shaped noise (SSN), speech-envelope modulated noise (envSSN), one (1T) and two (2T) competing talkers. A two-stage Bayesian modeling approach was employed. In Stage 1, speech scores were corrected for target-word frequency/neighborhood density, psychometric function parameters were extracted from temporal tests, and cognitive measures were reduced to three composite variables. Stage 2 then applied Gaussian process models to predict speech scores from temporal and cognitive measures. Leave-one-out cross-validation and model stacking determined the best combination of predictive models. Performance in SSN/envSSN was best predicted by temporal envelope measures (forward masking, gap duration discrimination), while performance in 1T was best predicted by cognitive measures (executive function, processing speed). Temporal fine structure measures (frequency-modulation, interaural-phase-difference detection) predicted the number of 1T distractor responses. All models failed on 2T. These results show that prediction of speech-in-noise scores from suprathreshold “process” measures is highly task dependent.