Performance comparison of machine learning models used for predicting subclinical mastitis in dairy cows: Bagging, boosting, stacking, and super-learner ensembles versus single machine learning models

A Satoła,K Satoła

doi:10.3168/jds.2023-24243

Abstract

Mastitis has a substantial impact on the dairy industry across the world, causing dairy producers to suffer losses due to the reduced quality and quantity of produced milk. A further problem, related to this issue, is the excessive use of antibiotics that leads to the development of resistance in different bacterial strains. The growing consumer awareness oriented toward food safety and rational use of antibiotics has promoted the search for new methods of early identification of cows that may be at risk of developing the disease. Subclinical mastitis does not cause any visible changes to the udder or milk, and therefore it is more difficult to detect than clinical mastitis. The collection of large amounts of data related to milk performance of cows allows using machine learning (ML) methods to build models that could be used for classifying cows into healthy and at risk of subclinical mastitis. The data used for the purpose of this study included information from routine milk recording procedures. The data set consisted of 19,856 records of 2,227 Polish Holstein-Friesian cows from 3 herds. The authors decided to use the approach of building ensemble ML models, in particular bagging, boosting, stacking and super learner models, and comparing them for accuracy of identification of disease-affected cows against single ML models based on the Support Vector Machines, Logistic Regression, Gaussian Naïve Bayes, k-Nearest Neighbors and Decision Tree algorithms. The models were trained and evaluated based on the information recorded for herd 1 and using an 80:20 train-test split ratio according to animal ID (to avoid data leakage). The information recorded for herds 2 and 3 was only used to evaluate on unseen data models developed using the herd 1 data set. Among the single ML models, the Support Vector Machines model was found to be the most accurate in predicting subclinical mastitis at subsequent test-day when used both for the training set (mean F1-score of 0.760) and the testing sets containing data for herds 1, 2 and 3 (F1-score of 0.778, 0.790 and 0.741 respectively). The Gradient Boosting model was found to be the best performing model among the ensemble ML models (F1-score of 0.762, 0.779, 0.791 and 0.723 for the training set and the testing sets respectively). The super learner model, featuring the most advanced design and Logistic Regression in the meta layer, achieved the highest mean F1-score of 0.775 during the cross-validation, however, it was characterized by a slightly worse prediction accuracy of the testing sets (mean F1-score of 0.768, 0.790 and 0.693 for herds 1, 2 and 3 respectively). The study findings confirm the promising role of ensemble ML methods that were found to be slightly superior with respect to most of the single ML models.

Full Text