Abstract

Machine learning (ML) is perhaps the most useful tool for the interpretation of large genomic datasets. However, the performance of a single machine learning method in genomic selection (GS) is currently unsatisfactory. To improve the genomic predictions, we constructed a stacking ensemble learning framework (SELF), integrating three machine learning methods, to predict genomic estimated breeding values (GEBVs). The present study evaluated the prediction ability of SELF by analyzing three real datasets, with different genetic architecture; comparing the prediction accuracy of SELF, base learners, genomic best linear unbiased prediction (GBLUP) and BayesB. For each trait, SELF performed better than base learners, which included support vector regression (SVR), kernel ridge regression (KRR) and elastic net (ENET). The prediction accuracy of SELF was, on average, 7.70% higher than GBLUP in three datasets. Except for the milk fat percentage (MFP) traits, of the German Holstein dairy cattle dataset, SELF was more robust than BayesB in all remaining traits. Therefore, we believed that SEFL has the potential to be promoted to estimate GEBVs in other animals and plants.

Highlights

  • Genomic selection (GS) was first introduced by Meuwissen et al (2001), by using whole-genome markers’ information to predict the genomic estimated breeding values (GEBVs)

  • The first application of GS was on dairy cattle, to improve the selection of better performing genotypes and accelerate the genetic gain by shortening the breeding cycles (Hayes et al, 2009a; Crossa et al, 2017; Tong et al, 2020)

  • Three important economic traits were selected for latter analysis: live weight (LW), carcass weight (CW), and eye muscle area (EMA)

Read more

Summary

Introduction

Genomic selection (GS) was first introduced by Meuwissen et al (2001), by using whole-genome markers’ information to predict the genomic estimated breeding values (GEBVs). After more than 10 years of development, GS has been wildly used in livestock and plant breeding programs with high prediction accuracy (Hayes et al, 2009a; Heffner et al, 2009). GS has been applied to improve the prediction of complex disease phenotypes using genotype data (De Los Campos et al, 2010; Menden et al, 2013). A critical concern in genomic prediction is the prediction accuracy calculated by the Pearson’s correlation between the estimated breeding values and the corrected phenotypes. There was an increasing interest in applying machine learning (ML) to genomic prediction. Machine learning is a computer program which can optimize a performance criterion using training data, making predictions or decisions without being explicitly programmed (Alpaydin, 2020). ML has been used in GS and might have the best performance at the interpretation of large-scale genomic data (De Los Campos et al, 2010). González-Camacho et al (2018)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call