The growing use of automated systems in the dairy industry generates a vast amount of cow-level data daily, creating opportunities for using these data to support real-time decision-making. Currently, various commercial systems offer built-in alert algorithms to identify cows requiring attention. To our knowledge, no work has been done to compare the use of models accounting for herd-level variability on their predictive ability against automated systems. Long Short-Term Memory (LSTM) models are machine learning models capable of learning temporal patterns and making predictions based on time series data. The objective of our study was to evaluate the ability of LSTM models to identify a health alert associated with a ketosis diagnosis (HAK) using deviations of daily milk yield, milk FPR, number of successful milkings, rumination time, and activity index from the herd median by parity and DIM, considering various time series lengths and numbers of d before HAK. Additionally, we aimed to use Explainable Artificial Intelligence method to understand the relationships between input variables and model outputs. Data on daily milk yield, milk fat-to-protein ratio (FPR), number of successful milkings, rumination time, activity, and health events during 0 to 21 d in milk (DIM) were retrospectively obtained from a commercial Holstein dairy farm in northern Indiana from February 2020 to January 2023. A total of 1,743 cows were included in the analysis (non-HAK = 1,550; HAK = 193). Variables were transformed based on deviations from the herd median by parity and DIM. Six LSTM models were developed to identify HAK 1, 2, and 3 d before farm diagnosis using historic cow-level data with varying time series lengths. Model performance was assessed using repeated stratified 10-fold cross-validation for 20 repeats. The Shapley additive explanations framework (SHAP) was used for model explanation. Model accuracy was 83, 74, and 70%, balanced error rate was 17 to 18, 26 to 28, and 34%, sensitivity was 81 to 83, 71 to 74, and 62%, specificity was 83, 74, and 71%, positive predictive value was 38, 25 to 27, and 21%, negative predictive value was 97 to 98, 95 to 96, and 94%, and area under the curve was 0.89 to 0.90, 0.80 to 0.81, and 0.72 for models identifying HAK 1, 2, and 3 d before diagnosis, respectively. Performance declined as the time interval between identification and farm diagnosis increased, and extending the time series length did not improve model performance. Model explanation revealed that cows with lower milk yield, number of successful milkings, rumination time, and activity, and higher milk FPR compared with herdmates of the same parity and DIM were more likely to be classified as HAK. Our results demonstrate the potential of LSTM models in identifying HAK using deviations of daily milk production variables, rumination time, and activity index from the herd median by parity and DIM. Future studies are needed to evaluate the performance of health alerts using LSTM models controlling for herd-specific metrics against commercial built-in algorithms in multiple farms and for other disorders.