Leading edge surface erosion is an emerging issue in wind turbine blade reliability, causing a reduction in power performance, aerodynamic loads imbalance, increased noise emission, and, ultimately, additional maintenance costs, and, if left untreated, it leads to the compromise of the functionality of the blade. In this work, we first propose an empirical spatio-temporal stochastic model for simulating leading edge erosion, to be used in conjunction with aeroelastic simulations, and subsequently present a deep learning model to be trained on simulated data, which aims to monitor leading edge erosion by detecting and classifying the degradation severity. This could help wind farm operators to reduce maintenance costs by planning cleaning and repair activities more efficiently. The main ingredients of the model include a damage process that progresses at random times, across multiple discrete states characterized by a non-homogeneous compound Poisson process, which is used to describe the random and time-dependent degradation of the blade surface, thus implicitly affecting its aerodynamic properties. The model allows for one, or more, zones along the span of the blades to be independently affected by erosion. The proposed model accounts for uncertainties in the local airfoil aerodynamics via parameterization of the lift and drag coefficients’ curves. The proposed model was used to generate a stochastic ensemble of degrading airfoil aerodynamic polars, for use in forward aero-servo-elastic simulations, where we computed the effect of leading edge erosion degradation on the dynamic response of a wind turbine under varying turbulent input inflow conditions. The dynamic response was chosen as a defining output as this relates to the output variable that is most commonly monitored under a structural health monitoring (SHM) regime. In this context, we further proposed an approach for spatio-temporal dependent diagnostics of leading erosion, namely, a deep learning attention-based Transformer, which we modified for classification tasks on slow degradation processes with long sequence multivariate time-series as inputs. We performed multiple sets of numerical experiments, aiming to evaluate the Transformer for diagnostics and assess its limitations. The results revealed Transformers as a potent method for diagnosis of such degradation processes. The attention-based mechanism allows the network to focus on different features at different time intervals for better prediction accuracy, especially for long time-series sequences representing a slow degradation process.