Abstract

Clinical mastitis (CM) incidence is considerable in terms of cows affected per year, but cases are much less common in terms of detections per cow per milking. From a modeling perspective, where predictions are made every time any cow is milked, low CM incidence per cow day makes training, evaluating, and applying CM prediction models a challenge. The objective of this study was to build models for predicting CM incidence using time-series sensor data and choose models that maximize net return based on a cost matrix. Data collected from 2 university dairy farms, the University of Florida and Virginia Polytechnic Institute and State University, were used to gather representative data, including 110,156 milkings and 333 CM cases. Variables used in the models were milk yield, protein, lactose, fat, electrical conductivity, days in milk, lactation number, and activity as the number of steps, lying time, lying bouts, and lying bout duration. Models that predicted either likelihood of CM caused by gram-negative (GN) or gram-positive (GP) bacteria on each day were derived using extreme gradient boosting with weighting favoring true-positive cases, logistic responses, and log-loss errors. Model accuracies were determined using data randomly held out from the training set on each run. All variables considered were in terms of change (slope) over previous days, including the day CM was visually detected. The GN models had a median sensitivity (Se) of 52.6% and specificity (Sp) of 99.8%, whereas the GP models had a median Se of 37.5% and Sp of 99.9% when tested on the held-out data. In our models optimized to reduce cost from predictions, the Se was much less than Sp, suggesting that CM models might benefit from greater model weighting placed on Sp. Results also highlight the importance of positive predictive value (true positive cases per predicted positive case) along with Sp and Se, as models built on sparse data tend to predict too many false-positive cases. The calculated partial net return of our GN and GP models were -$0.15 and -$0.10 per cow per lactation, respectively, whereas International Organization for Standardization (ISO) standard models with Se of 80% and Sp of 99% would return -$1.32 per cow per lactation. Models chosen that minimized the cost to the farmer differed markedly from models that met ISO guidelines, showing asymmetry in targets between Sp and Se when the disease incidence rate is low. Because of the unique challenges that low-incidence diseases like CM present, we recommend that future CM predictive models consider the economic and practical implications in addition to the traditional model evaluation metrics.

Highlights

  • Costs associated with mastitis in the dairy cow are estimated to be $2 billion in the United States annually

  • As clinical mastitis (CM) is a disease with low incidence, detection using a model with International Organization for Standardization (ISO) standard performance in terms of Sp and Se will produce more false alerts than a model that predicts a more prevalent disease

  • Guidelines for CM prediction models tend to focus on Se and Sp values, other measures, including positive predictive value (PPV), may be crucial due to the low incidence and high false-positive rate in prediction

Read more

Summary

Introduction

Costs associated with mastitis in the dairy cow are estimated to be $2 billion in the United States annually. Mastitis incidence in the United States is estimated to be between 25 and 41 cases per 100 cows per lactation, equivalent to roughly one clinical mastitis (CM) case per 890 (Pol and Ruegg, 2007) to 1,460 (USDA, 2014) days per cow (0.07% to 0.1%) This small proportion of cases relative to healthy checks poses a challenge to modelers attempting to predict CM. Dominiak and Kristensen (2017) show how any arbitrary error rate can be achieved with fixed Se and Sp, by lowering the prevalence This idea that prevalence is an important factor in evaluating disease prediction models, along with Se and Sp, is incongruous with the practice of using only Se and Sp as gold standards. By ignoring prevalence in cases of low prevalence diseases such as CM, we are guaranteeing models with greater error rates than for diseases with greater prevalence

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call