Training set optimization of genomic prediction by means of EthAcc.

Brigitte Mangin,Ellen Goudemand-Dugue,Renaud Rincent,Charles-Elie Rabier,Laurence Moreau,Momiao Xiong

doi:10.1371/journal.pone.0205629

Abstract

Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc’s precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.

Highlights

Prediction of unobserved individuals using genomic information has gained increasing importance in plant and animal breeding [1, 2]
This difference in selection can explain the extreme difference in the accuracy that was observed with the two training sets
To delve deeper into what happened, we compared the causal QTLs detected by the forward selection approach of multilocus mixed model (MLMM) in the two training sets previously optimized by CDmean and EthAcc

Summary

Introduction

Prediction of unobserved individuals using genomic information has gained increasing importance in plant and animal breeding [1, 2]. It is an accurate tool for prediction of complex diseases in humans [3, 4] and is included in the precision medicine initiative [5]. A training set of individuals, the so-called training set, that is both phenotyped and genotyped is used to train a model that is applied to predict unobserved individuals, the so-called test set, on the basis of only genotyping data from the latter. The specific roles of EGD are articulated in the ‘author contributions’ section. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Feb 19, 2019
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Training set optimization of genomic prediction by means of EthAcc.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Accuracy of Genomic Prediction in Dairy Cattle
Malena Erbe
-
Malena ErbeMalena Erbe
20 Feb 2022
20 Feb 2022

Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking
Hans D Daetwyler ... John M Hickey
Genetics | VOL. 193
Hans D Daetwyler, et. al.Hans D Daetwyler ... John M Hickey
01 Feb 2013
Genetics | VOL. 193

Genomic Prediction and Association Analysis with Models Including Dominance Effects for Important Traits in Chinese Simmental Beef Cattle.
Ying Liu ... Zezhao Wang
Animals | VOL. 9
Ying Liu, et. al.Ying Liu ... Zezhao Wang
01 Dec 2019
Animals | VOL. 9

The impact of population structure on genomic prediction in stratified populations
Zhigang Guo ... Zhanyou Xu
Theoretical and Applied Genetics | VOL. 127
Zhigang Guo, et. al.Zhigang Guo ... Zhanyou Xu
24 Jan 2014
Theoretical and Applied Genetics | VOL. 127

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Training set optimization of genomic prediction by means of EthAcc.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE