Predictive ability of Random Forests, Boosting, Support Vector Machines and Genomic Best Linear Unbiased Prediction in different scenarios of genomic evaluation

Farhad Ghafouri-Kesbi,Mahmood Honarvar,Ghodratollah Rahimi-Mianji,Ardeshir Nejati-Javaremi

doi:10.1071/an15538

Farhad Ghafouri-Kesbi, Mahmood Honarvar + Show 2 more

https://doi.org/10.1071/an15538

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Three machine learning algorithms: Random Forests (RF), Boosting and Support Vector Machines (SVM) as well as Genomic Best Linear Unbiased Prediction (GBLUP) were used to predict genomic breeding values (GBV) and their predictive performance was compared in different combinations of heritability (0.1, 0.3, and 0.5), number of quantitative trait loci (QTL) (100, 1000) and distribution of QTL effects (normal, uniform and gamma). To this end, a genome comprised of five chromosomes, one Morgan each, was simulated on which 10000 bi-allelic single nucleotide polymorphisms were distributed. Pearson’s correlation between the true and predicted GBV and Mean Squared Error of GBV prediction were used, respectively, as measures of the predictive accuracy and the overall fit achieved with each method. In all methods, an increase in accuracy of prediction was seen following increase in heritability and decrease in the number of QTL. GBLUP had better predictive accuracy than machine learning methods in particular in the scenarios of higher number of QTL and normal and uniform distributions of QTL effects; though in most cases, the differences were non-significant. In the scenarios of small number of QTL and gamma distribution of QTL effects, Boosting outperformed other methods. Regarding Mean Squared Error of GBV prediction, in most cases Boosting outperformed other methods, although the estimates were close to that of GBLUP. Among methods studied, SVM with 0.6 gigabytes (GIG) was the most efficient user of memory followed by RF, GBLUP and Boosting with 1.2-GIG, 1.3-GIG and 2.3-GIG memory requirements, respectively. Regarding computational time, GBLUP, SVM, RF and Boosting ranked first, second, third and last with 10 min, 15 min, 75 min and 600 min, respectively. It was concluded that although stochastic gradient Boosting can predict GBV with high prediction accuracy, significantly longer computational time and memory requirement can be a serious limitation for this algorithm. Therefore, using of other variants of Boosting such as Random Boosting was recommended for genomic evaluation.

Full Text