Genomic insights into population structure and predictive breeding for climate-resilient coffee.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Climate change poses a growing threat to global coffee production, particularly for Coffea arabica, the most widely cultivated species. Coffea canephora (Robusta), with greater tolerance to heat and environmental stress, represents a critical genetic resource for sustaining future supply. Despite its increasing importance, the species is still relatively understudied with respect to population structure and trait architecture-factors that are important for guiding breeding efforts. Here, we combine population genetic analyses with genomic prediction to inform the improvement of C. canephora using a representative breeding collection from West Africa. First, we characterized the genetic structure of the cultivated germplasm and confirmed the presence of three main genetic pools: Robusta, Conilon, and Guinean. Second, we quantified phenotypic variation and genetic parameters for 11 agronomic traits, demonstrating a significant contribution of non-additive effects-particularly for yield. Third, we evaluated the performance of genomic prediction models incorporating additive and dominance effects, and proposed their integration into a reciprocal recurrent selection scheme to exploit heterosis. Altogether, our findings highlight the utility of incorporating structured genetic diversity and non-additive effects into breeding strategies. The framework presented here provides a foundation for improving the predictive accuracy and long-term adaptability of C. canephora, with broader implications for genomic-assisted breeding under climate stress.

Similar Papers
  • Research Article
  • Cite Count Icon 37
  • 10.1007/s00122-021-03822-1
Improved genomic prediction of clonal performance in sugarcane by exploiting non-additive genetic effects
  • Apr 26, 2021
  • Theoretical and Applied Genetics
  • Seema Yadav + 11 more

Key messageNon-additive genetic effects seem to play a substantial role in the expression of complex traits in sugarcane. Including non-additive effects in genomic prediction models significantly improves the prediction accuracy of clonal performance.In the recent decade, genetic progress has been slow in sugarcane. One reason might be that non-additive genetic effects contribute substantially to complex traits. Dense marker information provides the opportunity to exploit non-additive effects in genomic prediction. In this study, a series of genomic best linear unbiased prediction (GBLUP) models that account for additive and non-additive effects were assessed to improve the accuracy of clonal prediction. The reproducible kernel Hilbert space model, which captures non-additive genetic effects, was also tested. The models were compared using 3,006 genotyped elite clones measured for cane per hectare (TCH), commercial cane sugar (CCS), and Fibre content. Three forward prediction scenarios were considered to investigate the robustness of genomic prediction. By using a pseudo-diploid parameterization, we found significant non-additive effects that accounted for almost two-thirds of the total genetic variance for TCH. Average heterozygosity also had a major impact on TCH, indicating that directional dominance may be an important source of phenotypic variation for this trait. The extended-GBLUP model improved the prediction accuracies by at least 17% for TCH, but no improvement was observed for CCS and Fibre. Our results imply that non-additive genetic variance is important for complex traits in sugarcane, although further work is required to better understand the variance component partitioning in a highly polyploid context. Genomics-based breeding will likely benefit from exploiting non-additive genetic effects, especially in designing crossing schemes. These findings can help to improve clonal prediction, enabling a more accurate identification of variety candidates for the sugarcane industry.

  • Research Article
  • Cite Count Icon 171
  • 10.1007/s00122-013-2255-x
The impact of population structure on genomic prediction in stratified populations
  • Jan 24, 2014
  • Theoretical and Applied Genetics
  • Zhigang Guo + 8 more

Impacts of population structure on the evaluation of genomic heritability and prediction were investigated and quantified using high-density markers in diverse panels in rice and maize. Population structure is an important factor affecting estimation of genomic heritability and assessment of genomic prediction in stratified populations. In this study, our first objective was to assess effects of population structure on estimations of genomic heritability using the diversity panels in rice and maize. Results indicate population structure explained 33 and 7.5% of genomic heritability for rice and maize, respectively, depending on traits, with the remaining heritability explained by within-subpopulation variation. Estimates of within-subpopulation heritability were higher than that derived from quantitative trait loci identified in genome-wide association studies, suggesting 65% improvement in genetic gains. The second objective was to evaluate effects of population structure on genomic prediction using cross-validation experiments. When population structure exists in both training and validation sets, correcting for population structure led to a significant decrease in accuracy with genomic prediction. In contrast, when prediction was limited to a specific subpopulation, population structure showed little effect on accuracy and within-subpopulation genetic variance dominated predictions. Finally, effects of genomic heritability on genomic prediction were investigated. Accuracies with genomic prediction increased with genomic heritability in both training and validation sets, with the former showing a slightly greater impact. In summary, our results suggest that the population structure contribution to genomic prediction varies based on prediction strategies, and is also affected by the genetic architectures of traits and populations. In practical breeding, these conclusions may be helpful to better understand and utilize the different genetic resources in genomic prediction.

  • Research Article
  • Cite Count Icon 71
  • 10.1016/j.oneear.2021.06.002
Detecting vulnerability of humid tropical forests to multiple stressors
  • Jul 1, 2021
  • One Earth
  • Sassan Saatchi + 57 more

Detecting vulnerability of humid tropical forests to multiple stressors

  • Research Article
  • Cite Count Icon 5
  • 10.12688/f1000research.122437.2
Genomic prediction in plants: opportunities for ensemble machine learning based approaches.
  • Jan 10, 2023
  • F1000Research
  • Muhammad Farooq + 4 more

Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability ( h 2 and h 2 e ), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.12688/f1000research.122437.1
Genomic prediction in plants: opportunities for ensemble machine learning based approaches
  • Dec 26, 2022
  • F1000Research
  • Muhammad Farooq + 4 more

Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture.Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 andh2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs.Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods.Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.

  • Research Article
  • 10.1093/g3journal/jkaf176
Genomic and hyperspectral imaging-based prediction blending enables selection for reduced deoxynivalenol content in wheat grains
  • Aug 6, 2025
  • G3: Genes | Genomes | Genetics
  • Jonathan S Concepcion + 4 more

Breeding for low deoxynivalenol (DON) mycotoxin content in wheat is challenging due to the complexity of the trait and phenotyping limitations. Since phenomic prediction relies on nonadditive effects and genomic prediction on additive effects, their complementarity can improve selection accuracy. In this study DON-infected wheat kernels were imaged using a hyperspectral camera to generate reflectance values across the spectrum of visible and near-infrared light that were used in phenomic predictions. Five Bayesian generalized linear regression models and 2 machine learning models were trained using phenomic and genomic predictions from advanced soft winter wheat breeding lines evaluated in 2021 and 2022. Across all training sets and models, phenomic predictions using wavebands in the visible light spectrum (400 to 700 nm) had higher predictive ability than genomic predictions or phenomic predictions using the full waveband range (400 to 1,000 nm). Forward prediction using 2021 trial, 2022 trial, and combined trials as the training set was performed using model blending on 2 sets of F4:5 selection candidates evaluated independently in 2022 and 2023. The phenotypic and genetic correlations, as well as indirect selection accuracies, of the model averages of phenomic predictions and combined phenomic and genomic predictions were higher than genomic predictions alone. Accuracies depended on the combination of training set and selection candidates. Unsupervised K-means clustering using the blended predicted values partitioned selection candidates into 2 groups with high and low mean observed DON content. This study demonstrates the potential of hyperspectral imaging-based phenomic prediction to complement genomic prediction and highlights considerations for prediction-based selection of low DON in wheat.

  • Research Article
  • Cite Count Icon 12
  • 10.1093/genetics/iyac018
Improving genomic predictions with inbreeding and nonadditive effects in two admixed maize hybrid populations in single and multienvironment contexts.
  • Feb 12, 2022
  • Genetics
  • Morgane Roth + 5 more

Genetic admixture, resulting from the recombination between structural groups, is frequently encountered in breeding populations. In hybrid breeding, crossing admixed lines can generate substantial nonadditive genetic variance and contrasted levels of inbreeding which can impact trait variation. This study aimed at testing recent methodological developments for the modeling of inbreeding and nonadditive effects in order to increase prediction accuracy in admixed populations. Using two maize (Zea mays L.) populations of hybrids admixed between dent and flint heterotic groups, we compared a suite of five genomic prediction models incorporating (or not) parameters accounting for inbreeding and nonadditive effects with the natural and orthogonal interaction approach in single and multienvironment contexts. In both populations, variance decompositions showed the strong impact of inbreeding on plant yield, height, and flowering time which was supported by the superiority of prediction models incorporating this effect (+0.038 in predictive ability for mean yield). In most cases dominance variance was reduced when inbreeding was accounted for. The model including additivity, dominance, epistasis, and inbreeding effects appeared to be the most robust for prediction across traits and populations (+0.054 in predictive ability for mean yield). In a multienvironment context, we found that the inclusion of nonadditive and inbreeding effects was advantageous when predicting hybrids not yet observed in any environment. Overall, comparing variance decompositions was helpful to guide model selection for genomic prediction. Finally, we recommend the use of models including inbreeding and nonadditive parameters following the natural and orthogonal interaction approach to increase prediction accuracy in admixed populations.

  • Research Article
  • Cite Count Icon 2
  • 10.5187/jast.2025.e2
Effect of breed composition in genomic prediction using crossbred pig reference population.
  • Jan 1, 2025
  • Journal of animal science and technology
  • Euiseo Hong + 9 more

In contrast to conventional genomic prediction, which typically targets a single breed and circumvents the necessity for population structure adjustments, multi-breed genomic prediction necessitates accounting for population structure to mitigate potential bias. The presence of this structure in multi-breed datasets can influence prediction accuracy, rendering proper modeling crucial for achieving unbiased results. This study aimed to address the effect of population structure on multi-breed genomic prediction, particularly focusing on crossbred reference populations. The prediction accuracy of genomic models was assessed by incorporating genomic breed composition (GBC) or principal component analysis (PCA) into the genomic best linear unbiased prediction (GBLUP) model. The accuracy of five different genomic prediction models was evaluated using data from 354 Duroc × Korean native pig crossbreds, 1,105 Landrace × Korean native pig crossbreds, and 1,107 Landrace × Yorkshire × Duroc crossbreds. The models tested were GBLUP without population structure adjustment, GBLUP with PCA as a fixed effect, GBLUP with GBC as a fixed effect, GBLUP with PCA as a random effect, and GBLUP with GBC as a random effect. The highest prediction accuracies for backfat thickness (0.59) and carcass weight (0.50) were observed in Models 1, 4, and 5. In contrast, Models 2 and 3, which included population structure as a fixed effect, exhibited lower accuracies, with backfat thickness accuracies of 0.40 and 0.53 and carcass weight accuracies of 0.34 and 0.38, respectively. These findings suggest that in multi-breed genomic prediction, the most efficient and accurate approach is either to forgo adjusting for population structure or, if adjustments are necessary, to model it as a random effect. This study provides a robust framework for multi-breed genomic prediction, highlighting the critical role of appropriately accounting for population structure. Moreover, our findings have important implications for improving genomic selection efficiency, ultimately enhancing commercial production by optimizing prediction accuracy in crossbred populations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1186/s40104-020-00493-8
The superiority of multi-trait models with genotype-by-environment interactions in a limited number of environments for genomic prediction in pigs
  • Aug 19, 2020
  • Journal of Animal Science and Biotechnology
  • Hailiang Song + 2 more

BackgroundDifferent production systems and climates could lead to genotype-by-environment (G × E) interactions between populations, and the inclusion of G × E interactions is becoming essential in breeding decisions. The objective of this study was to investigate the performance of multi-trait models in genomic prediction in a limited number of environments with G × E interactions.ResultsIn total, 2,688 and 1,384 individuals with growth and reproduction phenotypes, respectively, from two Yorkshire pig populations with similar genetic backgrounds were genotyped with the PorcineSNP80 panel. Single- and multi-trait models with genomic best linear unbiased prediction (GBLUP) and BayesC π were implemented to investigate their genomic prediction abilities with 20 replicates of five-fold cross-validation. Our results regarding between-environment genetic correlations of growth and reproductive traits (ranging from 0.618 to 0.723) indicated the existence of G × E interactions between these two Yorkshire pig populations. For single-trait models, genomic prediction with GBLUP was only 1.1% more accurate on average in the combined population than in single populations, and no significant improvements were obtained by BayesC π for most traits. In addition, single-trait models with either GBLUP or BayesC π produced greater bias for the combined population than for single populations. However, multi-trait models with GBLUP and BayesC π better accommodated G × E interactions, yielding 2.2% – 3.8% and 1.0% – 2.5% higher prediction accuracies for growth and reproductive traits, respectively, compared to those for single-trait models of single populations and the combined population. The multi-trait models also yielded lower bias and larger gains in the case of a small reference population. The smaller improvement in prediction accuracy and larger bias obtained by the single-trait models in the combined population was mainly due to the low consistency of linkage disequilibrium between the two populations, which also caused the BayesC π method to always produce the largest standard error in marker effect estimation for the combined population.ConclusionsIn conclusion, our findings confirmed that directly combining populations to enlarge the reference population is not efficient in improving the accuracy of genomic prediction in the presence of G × E interactions, while multi-trait models perform better in a limited number of environments with G × E interactions.

  • Research Article
  • 10.1093/genetics/iyaf003
Trait genetic architecture and population structure determine model selection for genomic prediction in natural Arabidopsis thaliana populations.
  • Jan 16, 2025
  • Genetics
  • Patrick M Gibbs + 2 more

Genomic prediction applies to any agro- or ecologically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear, non-parametric approaches, has not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits in 1,000+ natural genotypes of the model plant Arabidopsis thaliana, we assessed the performance of penalized regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait-notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering time and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.

  • Research Article
  • Cite Count Icon 1
  • 10.1002/tpg2.20486
Machine learning for genomic and pedigree prediction in sugarcane.
  • Jun 26, 2024
  • The plant genome
  • Minoru Inamori + 7 more

Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non-additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non-additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single-nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross-validation methods: repeated 10-fold cross-validation and leave-one-family-out cross-validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10-fold cross-validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave-one-family-out cross-validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1371/journal.pone.0260997
Designing the best breeding strategy for Coffea canephora: Genetic evaluation of pure and hybrid individuals aiming to select for productivity and disease resistance traits
  • Dec 29, 2021
  • PLoS ONE
  • Emilly Ruas Alkimim + 8 more

Breeding programs of the species Coffea canephora rely heavily on the significant genetic variability between and within its two varietal groups (conilon and robusta). The use of hybrid families and individuals has been less common. The objectives of this study were to evaluate parents and families from the populations of conilon, robusta, and its hybrids and to define the best breeding and selection strategies for productivity and disease resistance traits. As such, 71 conilon clones, 56 robusta clones, and 20 hybrid families were evaluated over several years for the following traits: vegetative vigor, incidence of rust and cercosporiosis, fruit ripening time, fruit size, plant height, canopy diameter, and yield per plant. Components of variance and genetic parameters were estimated via residual maximum likelihood (REML) and genotypic values were predicted via best linear unbiased prediction (BLUP). Genetic variability among parents (clones) and hybrid families was detected for most of the evaluated traits. The Mulamba-Rank index suggests potential gains up to 17% for the genotypic aggregate of traits in the hybrid population. An intrapopulation recurrent selection within the hybrid population would be the best breeding strategy because the genetic variability, narrow and broad senses heritabilities and selective accuracies for important traits were maximized in the crossed population. Besides, such strategy is simple, low cost and quicker than the concurrent reciprocal recurrent selection in the two parental populations, and this maximizes the genetic gain for unit of time.

  • Research Article
  • Cite Count Icon 48
  • 10.1534/g3.115.021105
Genome-Enabled Estimates of Additive and Nonadditive Genetic Variances and Prediction of Apple Phenotypes Across Environments
  • Oct 22, 2015
  • G3: Genes|Genomes|Genetics
  • Satish Kumar + 5 more

The nonadditive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and nonadditive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (Malus × domestica Borkh.) phenotypes using relationship matrices constructed from genome-wide dense single nucleotide polymorphism (SNP) markers; and compare the accuracy of genomic predictions using genomic best linear unbiased prediction models with or without including nonadditive genetic effects. A set of 247 clonally replicated individuals was assessed for six fruit quality traits at two sites, and also genotyped using an Illumina 8K SNP array. Across several fruit quality traits, the additive, dominance, and epistatic effects contributed about 30%, 16%, and 19%, respectively, to the total phenotypic variance. Models ignoring nonadditive components yielded upwardly biased estimates of additive variance (heritability) for all traits in this study. The accuracy of genomic predicted genetic values (GEGV) varied from about 0.15 to 0.35 for various traits, and these were almost identical for models with or without including nonadditive effects. However, models including nonadditive genetic effects further reduced the bias of GEGV. Between-site genotypic correlations were high (>0.85) for all traits, and genotype-site interaction accounted for <10% of the phenotypic variability. The accuracy of prediction, when the validation set was present only at one site, was generally similar for both sites, and varied from about 0.50 to 0.85. The prediction accuracies were strongly influenced by trait heritability, and genetic relatedness between the training and validation families.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.scienta.2023.111985
Genetic dissection of complex traits in citrus: additive and non-additive genetic variances, inbreeding depression, and single-chromosome heritability
  • Mar 22, 2023
  • Scientia Horticulturae
  • Atsushi Imai + 4 more

Genetic dissection of complex traits in citrus: additive and non-additive genetic variances, inbreeding depression, and single-chromosome heritability

  • Research Article
  • Cite Count Icon 6
  • 10.3389/fgene.2022.843300
Genomic Prediction Using LD-Based Haplotypes in Combined Pig Populations.
  • Jun 9, 2022
  • Frontiers in Genetics
  • Haoqiang Ye + 8 more

The size of reference population is an important factor affecting genomic prediction. Thus, combining different populations in genomic prediction is an attractive way to improve prediction ability. However, combining multireference population roughly cannot increase the prediction accuracy as well as expected in pig. This may be due to different linkage disequilibrium (LD) pattern differences between population. In this study, we used the imputed whole-genome sequencing (WGS) data to construct LD-based haplotypes for genomic prediction in combined population to explore the impact of different single-nucleotide polymorphism (SNP) densities, variant representation (SNPs or haplotype alleles), and reference population size on the prediction accuracy for reproduction traits. Our results showed that genomic best linear unbiased prediction (GBLUP) using the WGS data can improve prediction accuracy in multi-population but not within-population. Not only the genomic prediction accuracy of the haplotype method using 80 K chip data in multi-population but also GBLUP for the multi-population (3.4–5.9%) was higher than that within-population (1.2–4.3%). More importantly, we have found that using the haplotype method based on the WGS data in multi-population has better genomic prediction performance, and our results showed that building haploblock in this scenario based on low LD threshold (r 2 = 0.2–0.3) produced an optimal set of variables for reproduction traits in Yorkshire pig population. Our results suggested that whether the use of the haplotype method based on the chip data or GBLUP (individual SNP method) based on the WGS data were beneficial for genomic prediction in multi-population, while simultaneously combining the haplotype method and WGS data was a better strategy for multi-population genomic evaluation.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.