Labeled simulated structural responses by Finite Element (FE) methods may be used to construct training sets for monitoring the health state of structures. Such an approach appears to be an attractive way to avoid experimental costs. However, the main obstacle is the simulation error contained in such numerical responses. The error can contaminate the training data and lead to wrong conclusions when the learned features try to generalize on subsequent online experimental data. It is therefore very important to have an estimation of how good the available simulated data is for a given problem. The present work gives a novel methodology where classifiers trained by FE vibration response data are subjected to inputs of various perturbations, emulating in such way the generalization gap (classification errors) due to simulation errors. The response of the classifiers is gathered afterwards in a dataset which is used to construct a function that estimates the generalization gap based on the perturbed intact state response. Results are presented that show the behavior of the proposed gap estimation function for different binary damage classification scenarios. A numerical truss structure is used to test the concept. Convolutional Neural Networks (CNN) are applied for all data-driven related tasks, such as learning of health state features in vibration responses or evaluation of the gap function. The results show a promising methodology that can be used to estimate the reliability of numerically trained classifiers. In the final part of this study, the framework is tested on an experimental truss structure. The presented methodology contributes to the quantitative study of uncertainty when extrapolating from the numerical to the experimental domain in the presence of modeling errors.