Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato.

Stefan Wilson,Fred Van Eeuwijk,Han A Mulder,Richard G F Visser,Chris Maliepaard,Marcos Malosetti

doi:10.3389/fpls.2021.771075

Stefan Wilson, Fred Van Eeuwijk + Show 4 more

Open Access

https://doi.org/10.3389/fpls.2021.771075

Copy DOI

Abstract

Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4–14% in cross-validation scenarios, and 2–8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5–10.5% and 0.4–4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed.

Highlights

The utilization of DNA marker information for selection in breeding programs has increased over the last two decades and can be attributed to two factors: the decrease of genotyping costs, and the advances in quantitative genetics methodology
Prediction Accuracy As mentioned in previous sections, the ranking of the training set construction methods will be based on a measure of prediction accuracy
For both the TV and TT schemes, the observed phenotypic values of the training set are fed to the statistical models to estimate marker effects, while the phenotypic values of the validation (TV scheme) and the test set (TT scheme), are hidden from the model

Summary

Introduction

The utilization of DNA marker information for selection in breeding programs has increased over the last two decades and can be attributed to two factors: the decrease of genotyping costs, and the advances in quantitative genetics methodology. The potential genetic gains from GP hinge on its ability to predict phenotypes accurately This prediction accuracy is dependent on various factors including but not restricted to: trait heritability (Heffner et al, 2009), statistical models (de los Campos et al, 2013), genetic architecture of traits (Daetwyler et al, 2013), population structure (Asoro et al, 2011; Guo et al, 2014) as well as the size and composition of the training/calibration set (Pszczola et al, 2012; Rincent et al, 2012; Bustos-Korts et al, 2016; Akdemir and Isidro-Sanchez, 2019). The training set should be constructed in a way that it covers a space which closely resembles the space occupied by future test sets This is important for GP because in more recent times, due to relatively cheap genotyping, molecular marker information (explanatory variables), can often be collected more efficiently than phenotype information (target). The question is, which individuals should be phenotyped and be used to calibrate the model and generate reliable predictions for individuals without phenotypic information?

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in plant science

Lead the way for us

Journal: Frontiers in plant science	Publication Date: Nov 24, 2021
License type: CC BY 4.0

Similar Papers

Data_Sheet_2.CSV
-
-
--
30 Nov 2021
30 Nov 2021

Table_2.DOCX
-
-
--
30 Nov 2021
30 Nov 2021

Data_Sheet_1.CSV
-
-
--
30 Nov 2021
30 Nov 2021

Table_1.DOCX
-
-
--
30 Nov 2021
30 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in plant science