ABSTRACT The easy-to-hard effect is a phenomenon where training on initially easy perceptual discrimination contrasts before introducing hard contrasts yields greater perceptual learning than training with hard contrasts for the same amount of time. However, easy-to-hard effects can be erased if initial easy trials are made too easy. Here, we show that this seemingly paradoxical result naturally emerges out of an artificial neural network model of perceptual learning based on the incremental differentiation of stimulus representations: the self-organizing map (SOM). Like human listeners, networks show a sweet spot for easy-to-hard effects at intermediate levels of easy contrast somewhere between too easy and too hard. This trend is apparent across a wide range of free parameters. Analysis of network learning dynamics shows that easy-to-hard effects manifest when competition for network representational space is reduced (relative to constant hard training), but the learned stimuli are still similar enough to hard contrast stimuli to allow generalization. The data show that an incremental perceptual learning model can account for subtle characteristics of easy-to-hard effects where learning theories grounded in attentional discovery of relevant features cannot. We suggest several avenues for further development of the model in order to account for a wider range of auditory learning phenomena, its use in the optimization of auditory training regimens, and its role in elucidating perceptual learning mechanisms.