One suggested approach to improve the reproductive performance of dairy herds is through the targeted management of subgroups of biologically similar animals, such as those with similar probabilities of becoming pregnant, termed pregnancy risk. We aimed to use readily available farm data to develop predictive models of pregnancy risk in dairy cows. Data from a convenience sample of 108 dairy herds in the UK were collated and each herd was randomly allocated, at a ratio of 80:20, to either training or testing data sets. Following data cleaning, there were a total of 78 herds in the training data set and 20 herds in the testing data set. Data were further split by parity into nulliparous, primiparous, and multiparous subsets. An XGBoost model was trained to predict the insemination outcome in each parity subset, with predictors from farm records of breeding, calving and milk recording. Training data comprised 74,511 inseminations in 45,909 nulliparous animals, 86,420 inseminations in 39,439 primiparous animals, and 158,294 inseminations in 32,520 multiparous animals. The final models were evaluated by predicting with the testing data, comprising 31,740 inseminations in 19,647 nulliparous animals, 38,588 inseminations in 16,215 primiparous animals, and 65,049 inseminations in 12,439 multiparous animals. Model discrimination was assessed by calculating the area under receiver operating characteristic curves (AUC); model calibration was assessed by plotting calibration curves and compared across test herds by calculating the expected calibration error (ECE) in each test herd. The models were unable to discriminate between insemination outcomes with high accuracy, with an AUC of 0.63, 0.59 and 0.62 in the nulliparous, primiparous and multiparous subsets, respectively. The models were generally well-calibrated, meaning the model-predicted pregnancy risks were similar to the observed pregnancy risks. The mean (SD) ECE in the test herds was 0.038 (0.023), 0.028 (0.012) and 0.020 (0.008) in the nulliparous, primiparous and multiparous subsets respectively. The predictive models reported here could theoretically be used to identify subgroups of animals with similar pregnancy risk to facilitate targeted reproductive management; or provide information about cows' relative pregnancy risk compared with the herd average, which may support on-farm decision-making. Further research is needed to evaluate the generalizability of these predictive models and understand the source of variation in ECE between herds; however, this study demonstrates that it is possible to accurately predict pregnancy risk in dairy cows using readily available farm data.
Read full abstract