Independent Validation of Genomic Prediction in Strawberry Over Multiple Cycles
The University of Florida strawberry (Fragaria × ananassa) breeding program has implemented genomic prediction (GP) as a tool for choosing outstanding parents for crosses over the last five seasons. This has allowed the use of some parents 1 year earlier than with traditional methods, thus reducing the duration of the breeding cycle. However, as the number of breeding cycles increases over time, greater knowledge is needed on how multiple cycles can be used in the practical implementation of GP in strawberry breeding. Advanced selections and cultivars totaling 1,558 unique individuals were tested in field trials for yield and fruit quality traits over five consecutive years and genotyped for 9,908 SNP markers. Prediction of breeding values was carried out using Bayes B models. Independent validation was carried out using separate trials/years as training (TRN) and testing (TST) populations. Single-trial predictive abilities for five polygenic traits averaged 0.35, which was reduced to 0.24 when individuals common across trials were excluded, emphasizing the importance of relatedness among training and testing populations. Training populations including up to four previous breeding cycles increased predictive abilities, likely due to increases in both training population size and relatedness. Predictive ability was also strongly influenced by heritability, but less so by changes in linkage disequilibrium and effective population size. Genotype by year interactions were minimal. A strategy for practical implementation of GP in strawberry breeding is outlined that uses multiple cycles to predict parental performance and accounts for traits not included in GP models when constructing crosses. Given the importance of relatedness to the success of GP in strawberry, future work could focus on the optimization of relatedness in the design of TRN and TST populations to increase predictive ability in the short-term without compromising long-term genetic gains.
- Research Article
23
- 10.1186/1297-9686-46-37
- Jun 9, 2014
- Genetics Selection Evolution
BackgroundAccuracy of genomic prediction depends on number of records in the training population, heritability, effective population size, genetic architecture, and relatedness of training and validation populations. Many traits have ordered categories including reproductive performance and susceptibility or resistance to disease. Categorical scores are often recorded because they are easier to obtain than continuous observations. Bayesian linear regression has been extended to the threshold model for genomic prediction. The objective of this study was to quantify reductions in accuracy for ordinal categorical traits relative to continuous traits.MethodsEfficiency of genomic prediction was evaluated for heritabilities of 0.10, 0.25 or 0.50. Phenotypes were simulated for 2250 purebred animals using 50 QTL selected from actual 50k SNP (single nucleotide polymorphism) genotypes giving a proportion of causal to total loci of.0001. A Bayes C π threshold model simultaneously fitted all 50k markers except those that represented QTL. Estimated SNP effects were utilized to predict genomic breeding values in purebred (n = 239) or multibreed (n = 924) validation populations. Correlations between true and predicted genomic merit in validation populations were used to assess predictive ability.ResultsAccuracies of genomic estimated breeding values ranged from 0.12 to 0.66 for purebred and from 0.04 to 0.53 for multibreed validation populations based on Bayes C π linear model analysis of the simulated underlying variable. Accuracies for ordinal categorical scores analyzed by the Bayes C π threshold model were 20% to 50% lower and ranged from 0.04 to 0.55 for purebred and from 0.01 to 0.44 for multibreed validation populations. Analysis of ordinal categorical scores using a linear model resulted in further reductions in accuracy.ConclusionsThreshold traits result in markedly lower accuracy than a linear model on the underlying variable. To achieve an accuracy equal or greater than for continuous phenotypes with a training population of 1000 animals, a 2.25 fold increase in training population size was required for categorical scores fitted with the threshold model. The threshold model resulted in higher accuracies than the linear model and its advantage was greatest when training populations were smallest.
- Research Article
15
- 10.1007/s11032-021-01203-6
- Feb 1, 2021
- Molecular breeding : new strategies in plant improvement
The online version contains supplementary material available at 10.1007/s11032-021-01203-6.
- Research Article
37
- 10.1371/journal.pone.0179191
- Jun 9, 2017
- PLOS ONE
The objective of this study was to explore the potential of genomic prediction (GP) for soybean resistance against Sclerotinia sclerotiorum (Lib.) de Bary, the causal agent of white mold (WM). A diverse panel of 465 soybean plant introduction accessions was phenotyped for WM resistance in replicated field and greenhouse tests. All plant accessions were previously genotyped using the SoySNP50K BeadChip. The predictive ability of six GP models were compared, and the impact of marker density and training population size on the predictive ability was investigated. Cross-prediction among environments was tested to determine the effectiveness of the prediction models. GP models had similar prediction accuracies for all experiments. Predictive ability did not improve significantly by using more than 5k SNPs, or by increasing the training population size (from 50% to 90% of the total of individuals). The GP model effectively predicted WM resistance across field and greenhouse experiments when each was used as either the training or validation population. The GP model was able to identify WM-resistant accessions in the USDA soybean germplasm collection that had previously been reported and were not included in the study panel. This study demonstrated the applicability of GP to identify useful genetic sources of WM resistance for soybean breeding. Further research will confirm the applicability of the proposed approach to other complex disease resistance traits and in other crops.
- Research Article
91
- 10.1007/s00122-019-03276-6
- Jan 1, 2019
- TAG. Theoretical and Applied Genetics. Theoretische Und Angewandte Genetik
Key messageThe optimization of training populations and the use of diagnostic markers as fixed effects increase the predictive ability of genomic prediction models in a cooperative wheat breeding panel.Plant breeding programs often have access to a large amount of historical data that is highly unbalanced, particularly across years. This study examined approaches to utilize these data sets as training populations to integrate genomic selection into existing pipelines. We used cross-validation to evaluate predictive ability in an unbalanced data set of 467 winter wheat (Triticum aestivum L.) genotypes evaluated in the Gulf Atlantic Wheat Nursery from 2008 to 2016. We evaluated the impact of different training population sizes and training population selection methods (Random, Clustering, PEVmean and PEVmean1) on predictive ability. We also evaluated inclusion of markers associated with major genes as fixed effects in prediction models for heading date, plant height, and resistance to powdery mildew (caused by Blumeria graminis f. sp. tritici). Increases in predictive ability as the size of the training population increased were more evident for Random and Clustering training population selection methods than for PEVmean and PEVmean1. The selection methods based on minimization of the prediction error variance (PEV) outperformed the Random and Clustering methods across all the population sizes. Major genes added as fixed effects always improved model predictive ability, with the greatest gains coming from combinations of multiple genes. Maximum predictabilities among all prediction methods were 0.64 for grain yield, 0.56 for test weight, 0.71 for heading date, 0.73 for plant height, and 0.60 for powdery mildew resistance. Our results demonstrate the utility of combining unbalanced phenotypic records with genome-wide SNP marker data for predicting the performance of untested genotypes.
- Abstract
9
- 10.1186/1753-6561-5-s7-o14
- Sep 13, 2011
- BMC Proceedings
Background A tree breeding program is characterized by long generation intervals which, over time, result in a much smaller number of breeding cycles when compared to annual crops. Moreover, most economically important traits in a tree-breeding program are quantitatively inherited, display low heritability and are expressed late in the life cycle. Genomic Selection (GS) is expected to be particularly valuable for tree species, leading to shorter generation intervals and improved genetic gain over time. The main factors that affect the accuracy of GS prediction models are the level of linkage disequilibrium (LD) in the training population, the training population size, the heritability of the trait and the number of QTL regulating its variation. However, it is yet largely unknown how stable prediction models are across environments and different ages. This knowledge is critical for tree breeders that wish to use genomic selection in their genetic improvement program. Here, we report the first assessment of the utility of genomic selection in a conifer species. We developed prediction models for growth traits measured at multiple sites, to evaluate the impact of genotype by environment interactions in their accuracy. Training populations were also measured over multiple ages and models were developed to assess their value in predicting breeding values later in the lifecycle.
- Research Article
- 10.18047/poljo.26.1.2
- Jun 23, 2020
- Poljoprivreda
Genomic prediction accuracy (r_MP) is affected by many factors, such as the trait heritability, training population size and structure, and the number of markers. This study’s objective was to investigate the factors associated with r_MP for the ear height and the plant height in two planting densities in testcrosses of maize (Zea mays L.) IBM population. Genetic correlations between the training and validation populations were calculated. The high heritability estimates and correlations between the traits were observed. The non-zero estimates of r_MP for all trait-density combinations implied an efficiency of genomic selection. The lower than expected values of genetic correlations were observed between the training and validation populations. However, a strong correlation was observed between a genetic correlation of training and the validation population and r_MP in all three sizes of training populations assessed (20-40%, 40-60%, and 60-80%), suggesting that the size of the training population can be kept low by an appropriate selection while maintaining a high r_MP. Further studies of relationships between the training and validation populations with larger effective population sizes are suggested, as reducing the size of training population while maintaining a high r_MP can facilitate a more effective allocation of resources in a maize breeding program.
- Research Article
8
- 10.1002/csc2.20104
- Jan 1, 2020
- Crop Science
Genomic prediction accuracy is affected by population size, trait heritability, relatedness of training and validation populations, marker density, and genetic architecture. Nested association mapping (NAM) populations have advantages in many of these features compared with biparental families and may be an effective strategy for increasing prediction accuracy. The classic NAM design was modified to create a two‐row spring malting barley (Hordeum vulgare L.) population of 1341 F3:F4 lines in seven families that was phenotyped for heading date, plant height, leaf rust, spot blotch, pre‐harvest sprouting, and grain protein. Quantitative trait loci (QTL) were detected for plant height, leaf rust, pre‐harvest sprouting, and spot blotch with genome‐wide association analyses. Prediction accuracies were assessed in validation populations consisting of a single family or multiple families. Across‐family prediction accuracy (.607–.811) generally surpassed within‐family prediction accuracy, particularly for traits with high across‐family variance. Reductions in marker density (70–80%) and training population size (25–50%) did not cause significant loss of prediction accuracy. Addition of fixed marker effects from genome‐wide association had minimal impact on prediction accuracy in the full training population but improved accuracy in reduced training populations. Within‐family prediction for traits highly influenced by family structure was improved by adding half‐sibs to the training population. Connected half‐sib training populations could be useful for new and established breeding programs looking to implement genomic selection due to benefits of family structure on prediction accuracy, genotyping, genetic diversity, and genetic mapping.
- Research Article
4
- 10.1016/j.cj.2021.09.001
- Oct 23, 2021
- The Crop Journal
A novel genomic prediction method combining randomized Haseman-Elston regression with a modified algorithm for Proven and Young for large genomic data
- Research Article
40
- 10.1186/s12870-022-03479-y
- Feb 26, 2022
- BMC Plant Biology
BackgroundGenomic selection is a powerful tool in plant breeding. By building a prediction model using a training set with markers and phenotypes, genomic estimated breeding values (GEBVs) can be used as predictions of breeding values in a target set with only genotype data. There is, however, limited information on how prediction accuracy of genomic prediction can be optimized. The objective of this study was to evaluate the performance of 11 genomic prediction models across species in terms of prediction accuracy for two traits with different heritabilities using several subsets of markers and training population proportions. Species studied were maize (Zea mays, L.), soybean (Glycine max, L.), and rice (Oryza sativa, L.), which vary in linkage disequilibrium (LD) decay rates and have contrasting genetic architectures.ResultsCorrelations between observed and predicted GEBVs were determined via cross validation for three training-to-testing proportions (90:10, 70:30, and 50:50). Maize, which has the shortest extent of LD, showed the highest prediction accuracy. Amongst all the models tested, Bayes B performed better than or equal to all other models for each trait in all the three crops. Traits with higher broad-sense and narrow-sense heritabilities were associated with higher prediction accuracy. When subsets of markers were selected based on LD, the accuracy was similar to that observed from the complete set of markers. However, prediction accuracies were significantly improved when using a subset of total markers that were significant at P ≤ 0.05 or P ≤ 0.10. As expected, exclusion of QTL-associated markers in the model reduced prediction accuracy. Prediction accuracy varied among different training population proportions.ConclusionsWe conclude that prediction accuracy for genomic selection can be improved by using the Bayes B model with a subset of significant markers and by selecting the training population based on narrow sense heritability.
- Research Article
2
- 10.1002/tpg2.70018
- Mar 31, 2025
- The Plant Genome
High‐throughput digital phenotyping (DP) has been widely explored in plant breeding to assess large numbers of genotypes with minimal manual labor and reduced cost and time. DP platforms using high‐resolution images captured by drones and tractor‐based platforms have recently allowed the University of Florida strawberry (Fragaria × ananassa) breeding program to assess vegetative biomass at scale. Biomass has not previously been explored in a strawberry breeding context due to the labor required and the need to destroy the plant. This study aims to understand the genetic basis of predicted vegetative biomass and biomass‐related traits and to chart a path for the combined use of DP and genomics in strawberry breeding. Aboveground dry vegetative biomass was estimated by adapting a previously published model using ground‐truth data on a subset of breeding germplasm. High‐resolution images were collected on clonally replicated trials at different time points during the fruiting season. There was moderate to high heritability (h2 = 0.26–0.56) for predicted vegetative biomass, and genetic correlations between vegetative biomass and marketable yield were mostly positive (rG = −0.13–0.47). Fruit yield traits scaled on a vegetative biomass basis also had moderate to high heritability (h2 = 0.25–0.64). This suggests that vegetative biomass can be decreased or increased through selection, and that marketable fruit yield can be improved without simultaneously increasing plant size. No consistent marker‐trait associations were discovered via genome‐wide association studies. On the other hand, predictive abilities from genomic selection ranged from 0.15 to 0.46 across traits and years, suggesting that genomic prediction will be an effective breeding tool for vegetative biomass in strawberry.
- Research Article
59
- 10.1186/s12863-019-0785-1
- Nov 1, 2019
- BMC Genetics
BackgroundGenomic selection has the potential to increase genetic gains by using molecular markers as predictors of breeding values of individuals. This study evaluated the accuracy of predictions for grain yield, heading date, plant height, and yield components in soft red winter wheat under different prediction scenarios. Response to selection for grain yield was also compared across different selection strategies- phenotypic, marker-based, genomic, combination of phenotypic and genomic, and random selections.ResultsGenomic selection was implemented through a ridge regression best linear unbiased prediction model in two scenarios- cross-validations and independent predictions. Accuracy for cross-validations was assessed using a diverse panel under different marker number, training population size, relatedness between training and validation populations, and inclusion of fixed effect in the model. The population in the first scenario was then trained and used to predict grain yield of biparental populations for independent validations. Using subsets of significant markers from association mapping increased accuracy by 64–70% for grain yield but resulted in lower accuracy for traits with high heritability such as plant height. Increasing size of training population resulted in an increase in accuracy, with maximum values reached when ~ 60% of the lines were used as a training panel. Predictions using related subpopulations also resulted in higher accuracies. Inclusion of major growth habit genes as fixed effect in the model caused increase in grain yield accuracy under a cross-validation procedure. Independent predictions resulted in accuracy ranging between − 0.14 and 0.43, dependent on the grouping of site-year data for the training and validation populations. Genomic selection was “superior” to marker-based selection in terms of response to selection for yield. Supplementing phenotypic with genomic selection resulted in approximately 10% gain in response compared to using phenotypic selection alone.ConclusionsOur results showed the effects of different factors on accuracy for yield and agronomic traits. Among the factors studied, training population size and relatedness between training and validation population had the greatest impact on accuracy. Ultimately, combining phenotypic with genomic selection would be relevant for accelerating genetic gains for yield in winter wheat.
- Research Article
126
- 10.1016/j.molp.2024.03.007
- Mar 12, 2024
- Molecular plant
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP—theoretically reaching one when using the Pearson’s correlation as a metric—is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
- Research Article
- 10.1093/g3journal/jkaf218
- Oct 23, 2025
- G3: Genes | Genomes | Genetics
Genomic prediction (GP) has shown to be a valuable tool for genetic improvement in breeding programs but requires large training populations in order to build robust models. This is difficult to obtain for newly established breeding programs. Here, we aimed to overcome this challenge by combining datasets from 4 different barley breeding programs, utilizing up to 12 years of data to increase prediction accuracy in a more recently established 6-rowed winter (6RW) barley breeding program. By allowing data to accumulate in a breeding program as the years progress, we investigated when GP accuracy in 6RW benefitted from external populations. To do this, we focused on several parameters: training population size, choice of model for multipopulation GP (univariate versus multivariate), the key trait under investigation (grain yield, plant height, or rust resistance), and genetic distance between populations. We found that in the early stages of a breeding program, prediction of the 6RW population could benefit from inclusion of an external population, but the advantage depended on the specific population and trait under investigation. However, when data from all 4 years were available, multipopulation GP generally performed similarly to within-population GP. Additionally, when comparing multivariate and univariate models for multipopulation GP, the multivariate model often performed significantly worse, despite strong genetic correlations between the populations involved. This was especially the case when data were sparse and the model required estimation of numerous parameters from a small number of observations. Altogether, our results suggest that multipopulation GP is beneficial only in the very early stages of new breeding programs, emphasizing its relevance for newly established breeding programs or new breeding goals, especially for related populations.
- Research Article
81
- 10.1371/journal.pone.0169606
- Jan 12, 2017
- PLOS ONE
Wheat breeding programs generate a large amount of variation which cannot be completely explored because of limited phenotyping throughput. Genomic prediction (GP) has been proposed as a new tool which provides breeding values estimations without the need of phenotyping all the material produced but only a subset of it named training population (TP). However, genotyping of all the accessions under analysis is needed and, therefore, optimizing TP dimension and genotyping strategy is pivotal to implement GP in commercial breeding schemes. Here, we explored the optimum TP size and we integrated pedigree records and genome wide association studies (GWAS) results to optimize the genotyping strategy. A total of 988 advanced wheat breeding lines were genotyped with the Illumina 15K SNPs wheat chip and phenotyped across several years and locations for yield, lodging, and starch content. Cross-validation using the largest possible TP size and all the SNPs available after editing (~11k), yielded predictive abilities (rGP) ranging between 0.5–0.6. In order to explore the Training population size, rGP were computed using progressively smaller TP. These exercises showed that TP of around 700 lines were enough to yield the highest observed rGP. Moreover, rGP were calculated by randomly reducing the SNPs number. This showed that around 1K markers were enough to reach the highest observed rGP. GWAS was used to identify markers associated with the traits analyzed. A GWAS-based selection of SNPs resulted in increased rGP when compared with random selection and few hundreds SNPs were sufficient to obtain the highest observed rGP. For each of these scenarios, advantages of adding the pedigree information were shown. Our results indicate that moderate TP sizes were enough to yield high rGP and that pedigree information and GWAS results can be used to greatly optimize the genotyping strategy.
- Research Article
61
- 10.1186/s12863-017-0476-8
- Jan 26, 2017
- BMC Genetics
BackgroundNew Zealand has some unique Terminal Sire composite sheep breeds, which were developed in the last three decades to meet commercial needs. These composite breeds were developed based on crossing various Terminal Sire and Maternal breeds and, therefore, present high genetic diversity compared to other sheep breeds. Their breeding programs are focused on improving carcass and meat quality traits. There is an interest from the industry to implement genomic selection in this population to increase the rates of genetic gain. Therefore, the main objectives of this study were to determine the accuracy of predicted genomic breeding values for various growth, carcass and meat quality traits using a HD SNP chip and to evaluate alternative genomic relationship matrices, validation designs and genomic prediction scenarios. A large multi-breed population (n = 14,845) was genotyped with the HD SNP chip (600 K) and phenotypes were collected for a variety of traits.ResultsThe average observed accuracies (± SD) for traits measured in the live animal, carcass, and, meat quality traits ranged from 0.18 ± 0.07 to 0.33 ± 0.10, 0.28 ± 0.09 to 0.55 ± 0.05 and 0.21 ± 0.07 to 0.36 ± 0.08, respectively, depending on the scenario/method used in the genomic predictions. When accounting for population stratification by adjusting for 2, 4 or 6 principal components (PCs) the observed accuracies of molecular breeding values (mBVs) decreased or kept constant for all traits. The mBVs observed accuracies when fitting both G and A matrices were similar to fitting only G matrix. The lowest accuracies were observed for k-means cross-validation and forward validation performed within each k-means cluster.ConclusionsThe accuracies observed in this study support the feasibility of genomic selection for growth, carcass and meat quality traits in New Zealand Terminal Sire breeds using the Ovine HD SNP chip. There was a clear advantage on using a mixed training population instead of performing analyzes per genomic clusters. In order to perform genomic predictions per breed group, genotyping more animals is recommended to increase the size of the training population within each group and the genetic relationship between training and validation populations. The different scenarios evaluated in this study will help geneticists and breeders to make wiser decisions in their breeding programs.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.