Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle

E.M.M Van Der Heide,R.F Veerkamp,M.L Van Pelt,C Kamphuis,I Athanasiadis,B.J Ducro

doi:10.3168/jds.2019-16295

E.M.M Van Der Heide, R.F Veerkamp + Show 4 more

Open Access

https://doi.org/10.3168/jds.2019-16295

Copy DOI

Abstract

In this study, we compared multiple logistic regression, a linear method, to naive Bayes and random forest, 2 nonlinear machine-learning methods. We used all 3 methods to predict individual survival to second lactation in dairy heifers. The data set used for prediction contained 6,847 heifers born between January 2012 and June 2013, and had known survival outcomes. Each animal had 50 genomic estimated breeding values available at birth and up to 65 phenotypic variables that accumulated over time. Survival was predicted at 5 moments in life: at birth, at 18 mo, at first calving, at 6 wk after first calving, and at 200 d after first calving. The data sets were randomly split into 70% training and 30% testing sets to evaluate model performance for 20-fold validation. The methods were compared for accuracy, sensitivity, specificity, area under the curve (AUC) value, contrasts between groups for the prediction outcomes, and increase in surviving animals in a practical scenario. At birth and 18 mo, all methods had overlapping performance; no method significantly outperformed the other. At first calving, 6 wk after first calving, and 200 d after first calving, random forest and naive Bayes had overlapping performance, and both machine-learning methods outperformed multiple logistic regression. Overall, naive Bayes has the highest average AUC at all decision points up to 200 d after first calving. Random forest had the highest AUC at 200 d after first calving. All methods obtained similar increases in survival in the practical scenario. Despite this, the methods appeared to predict the survival of individual heifers differently. All methods improved over time, but the changes in mean model outcomes for surviving and non-surviving animals differed by method. Furthermore, the correlations of individual predictions between methods ranged from r = 0.417 to r = 0.700; the lowest correlations were at first calving for all methods. In short, all 3 methods were able to predict survival at a population level, because all methods improved survival in a practical scenario. However, depending on the method used, predictions for individual animals were quite different between methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle

Abstract

Talk to us

Similar Papers

More From: Journal of Dairy Science

Lead the way for us

Journal: Journal of Dairy Science	Publication Date: Aug 22, 2019
Citations: 50

Similar Papers

Predicting metastasis in gastric cancer patients: machine learning-based approaches
Atefeh Talebi ... Nasrin Borumandnia
Scientific Reports | VOL. 13
Atefeh Talebi, et. al.Atefeh Talebi ... Nasrin Borumandnia
13 Mar 2023
Scientific Reports | VOL. 13

Editor's evaluation: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Eduardo Franco
-
Eduardo FrancoEduardo Franco
05 Sep 2022
05 Sep 2022

Decision letter: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Andrew N Mertens ... Eduardo Franco
-
Andrew N Mertens, et. al.Andrew N Mertens ... Eduardo Franco
05 Sep 2022
05 Sep 2022

Author response: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Sharia M Ahmed ... Ben J Brintz
-
Sharia M Ahmed, et. al.Sharia M Ahmed ... Ben J Brintz
21 Dec 2022
21 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle

Abstract

Talk to us

Similar Papers

More From: Journal of Dairy Science