Abstract

The present study deploys a comparison of Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and Genome Wide Association Studies (GWAS) in selecting optimum subsets of single nucleotide polymorphisms (SNPs) to be used in genomic prediction in cattle. The data simulation was carried out for 6,000 animals and 47,841 SNPs which include 43,633 polygenic markers and 4208 quantitative trait loci (QTL) using QMSim software. The genomic prediction was conducted with the best linear unbiased prediction (BLUP) method using the BLUPF90 program. The accuracy of prediction was computed in three different types, namely, Empirical all SNPs, Empirical QTL, and theoretical accuracy, Accuracy PEV . Among the three models, the highest Empirical all SNPs accuracy 0.79 was derived for GBM followed by 0.77 for XGBoost and 0.76 for GWAS. The Empirical QTL accuracy was almost equal for all three models. The maximum theoretical accuracy was obtained for GWAS which was 0.93, whereas GBM and XGBoost obtained 0.86 and 0.85 accuracy levels respectively. Our results indicate that all three models comparably performed in genomic predictions; however, subsets selected by both GBM and GWAS reported higher prediction accuracies compared to the whole SNP set. The number of QTL selected as a proportion of the total number of SNPs was superior in GWAS. These observations can be validated using real data which could enable further optimization of the analysis process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call