Genomic prediction in plants: opportunities for ensemble machine learning based approaches.

Muhammad Farooq,Aalt D.J Van Dijk,Dick De Ridder,Harm Nijveen,Shahid Mansoor

doi:10.12688/f1000research.122437.1

Muhammad Farooq, Aalt D.J Van Dijk + Show 3 more

Open Access

https://doi.org/10.12688/f1000research.122437.1

Copy DOI

Abstract

Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability ( h 2 and h 2 e ), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Jul 18, 2022
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Genomic prediction in plants: opportunities for ensemble machine learning based approaches.

Abstract

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Genomic prediction in plants: opportunities for ensemble machine learning based approaches.
Muhammad Farooq ... Aalt D.J Van Dijk
F1000Research | VOL. 11
Muhammad Farooq, et. al.Muhammad Farooq ... Aalt D.J Van Dijk
10 Jan 2023
F1000Research | VOL. 11

Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs
Xue Wang ... Ao Qiu
Journal of Animal Science and Biotechnology | VOL. 13
Xue Wang, et. al.Xue Wang ... Ao Qiu
17 May 2022
Journal of Animal Science and Biotechnology | VOL. 13

A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice.
Laval Jacquin ... Nourollah Ahmadi
Frontiers in Genetics | VOL. 7
Laval Jacquin, et. al.Laval Jacquin ... Nourollah Ahmadi
09 Aug 2016
Frontiers in Genetics | VOL. 7

Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle
Victor B Pedrosa ... Luiz F Brito
Journal of Dairy Science | VOL. 107
Victor B Pedrosa, et. al.Victor B Pedrosa ... Luiz F Brito
22 Feb 2024
Journal of Dairy Science | VOL. 107

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genomic prediction in plants: opportunities for ensemble machine learning based approaches.

Abstract

Talk to us

Similar Papers

More From: F1000Research