Parallel distributed processing (PDP) models have had a profound impact on the study of cognition. One domain in which they have been particularly influential is learning quasiregularity, in which mastery requires both learning regularities that capture the majority of the structure in the input plus learning exceptions that violate the regularities. How PDP models learn quasiregularity is still not well understood. Small- and large-scale analyses of a feedforward, 3-layer network were carried out to address 2 fundamental issues about network functioning: how the model can learn both regularities and exceptions without sacrificing generalizability and the nature of the hidden representation that makes this learning possible. Results show that capacity-limited learning pressures the network to form componential representations, which ensures good generalizability. Small and highly local perturbations of this representational system allow exceptions to be learned while minimally disrupting generalizability. Theoretical and methodological implications of the findings are discussed.