Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data

Lucio F.M Mota,Sara Pegolo,Toshimi Baba,Francisco Peñagaricano,Gota Morota,Giovanni Bittante,Alessio Cecchinato

doi:10.3168/jds.2020-19861

Lucio F.M Mota, Sara Pegolo + Show 5 more

Open Access

https://doi.org/10.3168/jds.2020-19861

Copy DOI

Abstract

Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood β-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Dairy Science	Publication Date: Apr 15, 2021
Citations: 20	License type: elsevier-specific: oa user license

R Discovery Prime

R Discovery Prime

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data

Abstract

Talk to us

Similar Papers

More From: Journal of Dairy Science

Lead the way for us

Similar Papers

Mapping the spatial distribution of Aedes aegypti and Aedes albopictus
Fangyu Ding ... Gang Lin
Acta Tropica | VOL. 178
Fangyu Ding, et. al.Fangyu Ding ... Gang Lin
27 Nov 2017
Acta Tropica | VOL. 178

Machine Learning Prediction of Liver Allograft Utilization From Deceased Organ Donors Using the National Donor Management Goals Registry.
Andrew M Bishara ... Dieter Adelmann
Transplantation Direct | VOL. 7
Andrew M Bishara, et. al.Andrew M Bishara ... Dieter Adelmann
27 Sep 2021
Transplantation Direct | VOL. 7

Machine-learning algorithms in screening for type 2 diabetes mellitus: Data from Fasa Adults Cohort Study.
Hanieh Karmand ... Reza Tabrizi
Endocrinology, Diabetes & Metabolism | VOL. 7
Hanieh Karmand, et. al.Hanieh Karmand ... Reza Tabrizi
27 Feb 2024
Endocrinology, Diabetes & Metabolism | VOL. 7

Real-time milk analysis integrated with stacking ensemble learning as a tool for the daily prediction of cheese-making traits in Holstein cattle
Lucio F.M Mota ... Alessio Cecchinato
Journal of Dairy Science | VOL. 105
Lucio F.M Mota, et. al.Lucio F.M Mota ... Alessio Cecchinato
10 Mar 2022
Journal of Dairy Science | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data

Abstract

Talk to us

Similar Papers

More From: Journal of Dairy Science