Prediction of subacute ruminal acidosis based on milk fatty acids: A comparison of linear discriminant and support vector machine approaches for model development

E Colman,W Waegeman,B De Baets,V Fievez

doi:10.1016/j.compag.2015.01.002

E Colman, W Waegeman + Show 2 more

Open Access

PDF Available

https://doi.org/10.1016/j.compag.2015.01.002

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Subacute ruminal acidosis (SARA), characterized by low rumen pH, is one of the most important metabolic disorders in dairy cattle. As dairy cows experiencing SARA often do not exhibit overt clinical symptoms, diagnostic biomarkers in milk are of interest. Data of six acidosis induction experiments with rumen-fistulated dairy cows were combined to assess the potential of milk fatty acids (FA) to identify acidotic cases, based on three threshold values often reported in literature, i.e. time pH<5.6 of 180min/d and 283min/d and time pH below 5.8 of 475min/d (N=442 cases, of which 111–165 acidotic cases, depending on the applied threshold value). Both linear discriminant analysis (LDA) as well as support vector machines (SVM) were used to develop classification models, with SVM based on two common types of kernel functions (linear kernels and Gaussian radial basis function kernels) and including either the whole milk FA profile (41–69 milk FA, depending on the experiment) or a selected number of milk FA (i.e. both odd and branched chain FA and biohydrogenation derivates of poly-unsaturated FA, 13–16 FA). Both evaluation of the performance of individual classification models as well as comparison of models was based on the area under the receiver operating characteristic (ROC) curve. Non-linear models developed through a radial kernel based SVM approach seemed of particular interest when including all milk FA as model features. However, linear models based on the selected group of milk FA most often performed as good as the non-linear models including all milk FA, with the former being least time consuming and more cost-effective, both from a computational as well as an analytical perspective. However, combination of all data sets only resulted in good classification models when including data of each dataset upon training the model, whereas model performance decreased dramatically in case of cross-dataset cross-validation. This indicates an important impact of the origin of the datasets on the performance of the model which should be taken into account in further exploration of prediction models of SARA.

Full Text