AbstractMany industrial applications require turbulent closure models that yield accurate predictions across a wide spectrum of flow regimes. In this study, we investigate how data-driven augmentations of popular eddy viscosity models affect their generalization properties. We perform a systematic generalization study with a particular closure model that was trained for a single flow regime. We systematically increase the complexity of the test cases up to an industrial application governed by a multitude of flow patterns and thereby demonstrate that tailoring a model to a specific flow phenomenon decreases its generalization capability. In fact, the accuracy gain in regions that the model was explicitly calibrated for is smaller than the loss elsewhere. We furthermore show that extrapolation or, generally, a lack of training samples with a similar feature vector is not the main reason for generalization errors. There is actually only a weak correlation. Accordingly, generalization errors are probably due to a data-mismatch, i.e., a systematic difference in the mappings from the model inputs to the required responses. More diverse training sets unlikely provide a remedy due to the strict stability requirements emerging from the ill-conditioned RANS equations. The universality of data-driven eddy viscosity models with variable coefficients is, therefore, inherently limited.