Mastitis has detrimental effects on the world's dairy industry, reducing animal health, milk production and quality, as well as income for farmers. In addition, consumers' growing interest in food safety and rational usage of antibiotics highlights the need to develop novel strategies to improve mastitis detection, prevention, and management. In the present study we applied machine learning (ML) analyses to predict presence or absence of subclinical mastitis in Italian Mediterranean buffaloes, exploiting information collected the previous month during routine milk recording procedures, as well as climatic data. The data set included 3,891 records of 1,038 buffaloes from 6 herds located in Basilicata Region (South Italy). Prediction models were developed using 4 different ML algorithms (Generalized Linear Model, Support Vector Machines, Random Forest, and Neural Network) and 2 data set splitting approaches for the creation of the training and test sets (by record or by animal ID number, always with 80% of the data used for model training and the remaining 20% for model testing). Support Vector Machine was the best method to predict high or low somatic cell count at the subsequent test-day record in the validation set, and therefore it was used to estimate the contribution of each feature to the best model. Independently from the data set splitting approach, the most important features were somatic cell score, differential somatic cell count, electrical conductivity, and milk production. Among climatic data, the most informative were temperature and relative humidity. When the data were split by animal ID, an improvement in models' predictive performance on the test set was observed, suggesting this as the most appropriate data splitting approach in data sets with repeated measures to avoid data leakage. According to different metrics, Neural Network was the best method for making predictions on the test set. Our findings confirmed the promising role of ML methods to improve prevention and surveillance of subclinical mastitis, exploiting the large amount of data currently available to identify animals that would possibly have high somatic cell count the subsequent month.
Read full abstract