The Big BIT maize experiment: A large multi‐location, multi‐year, multi‐tester, multi‐population predictive breeding validation study
The Big Breeding Innovation Team (Big BIT) maize (Zea mays L.) experiment was one of the largest genomic data‐informed predictive breeding validation studies ever conducted. The experiment was a multi‐location, multi‐year, multi‐tester, multi‐population study involving F1 maize hybrids created by crossing individual doubled haploids to inbred testers. The purpose of the study, performed by DuPont Pioneer/Corteva Agriscience in 2017, 2018, and 2019, was to build comprehensive datasets to help answer a wide range of practical questions focused on optimizing predictive breeding strategies in maize. The purpose of our study is to (1) describe the design and unique features of our study and (2) discuss learnings with practical implications for plant breeders. Since the same F1 maize hybrids were grown across three distinct years, we use basic descriptive summary statistics to discuss our learnings. We provide a technical justification for the use of basic statistics and discuss the expected theoretical prediction accuracy of genomic estimated breeding values (GEBVs) of Big BIT individuals and families, and predictive abilities obtained by performing large‐scale cross‐validations. Our study provides multi‐year field data‐based evidence that, for inbred/variety development focused plant improvement efforts, early‐stage genetic evaluation should be based on GEBVs generated from wide‐area testing training datasets. This holds true for candidates for selection with or without own phenotypic records.
- Research Article
15
- 10.1186/s13765-019-0438-0
- Jun 14, 2019
- Applied Biological Chemistry
Despite the relevance of drought stress, the regulation of gene expression, protein accumulation, and plant physiology under water-deficit stress is not well understood in Korean F1 maize (Zea mays L.) hybrids. In this study, we investigated the effect of water deficit on the F1 maize hybrids, Ilmichal (Ilmi) and Gwangpyeongok (GPOK), by withholding water for 10 days during flowering. Water deficit severely reduced the relative water content, area, SPAD values, and stomatal conductance of leaves, stem length, and the dry matter content of aerial tissues in drought-stressed plants of both hybrids. However, the dry matter content of roots was reduced only in GPOK. Two-dimensional gel electrophoresis identified 24 spots representing proteins accumulated to differential levels in well-watered and drought-stressed plants of both hybrids. Further analysis of protein spots using matrix assisted laser desorption ionization–time of flight mass spectrometry and protein database searches revealed that nine proteins were involved in carbohydrate metabolism, seven in stress response, and two in photosynthesis. Among these proteins, delta 3,5-delta 2,4-dienoyl-CoA isomerase (spot 8) and bifunctional 3-phosphoadenosine 5-phosphosulfate synthetase 2 (spot 23) were present only in GPOK, whereas NAD-dependent epimerase/dehydratase (spot 13), NAD(P)H-quinone oxidoreductase subunit 2 A (spot 24), and an uncharacterized protein (spot 19) were present only in Ilmi, in response to water-deficit stress. Semi-quantitative reverse transcription PCR analysis showed that the transcript levels of most of the genes encoding these proteins correlated well with their protein levels, suggesting that water deficit affects gene transcription in F1 maize hybrids at the flowering stage.
- Research Article
52
- 10.1534/g3.118.200091
- Feb 6, 2018
- G3: Genes|Genomes|Genetics
Genomic selection (GS) offers the possibility to estimate the effects of genome-wide molecular markers, which can be used to calculate genomic estimated breeding values (GEBVs) for individuals without phenotypes. GEBVs can serve as a selection criterion in recurrent GS, maximizing single-cycle but not necessarily long-term genetic gain. As simple genome-wide sums, GEBVs do not take into account other genomic information, such as the map positions of loci and linkage phases of alleles. Therefore, we herein propose a novel selection criterion called expected maximum haploid breeding value (EMBV). EMBV predicts the expected performance of the best among a limited number of gametes that a candidate contributes to the next generation, if selected. We used simulations to examine the performance of EMBV in comparison with GEBV as well as the recently proposed criterion optimal haploid value (OHV) and weighted GS. We considered different population sizes, numbers of selected candidates, chromosome numbers and levels of dominant gene action. Criterion EMBV outperformed GEBV after about 5 selection cycles, achieved higher long-term genetic gain and maintained higher diversity in the population. The other selection criteria showed the potential to surpass both GEBV and EMBV in advanced cycles of the breeding program, but yielded substantially lower genetic gain in early to intermediate cycles, which makes them unattractive for practical breeding. Moreover, they were largely inferior in scenarios with dominant gene action. Overall, EMBV shows high potential to be a promising alternative selection criterion to GEBV for recurrent genomic selection.
- Research Article
20
- 10.1186/s12711-019-0481-7
- Jul 8, 2019
- Genetics Selection Evolution
BackgroundPig and poultry breeding programs aim at improving crossbred (CB) performance. Selection response may be suboptimal if only purebred (PB) performance is used to compute genomic estimated breeding values (GEBV) because the genetic correlation between PB and CB performance (r_{pc}) is often lower than 1. Thus, it may be beneficial to use information on both PB and CB performance. In addition, the accuracy of GEBV of PB animals for CB performance may improve when the breed-of-origin of alleles (BOA) is considered in the genomic relationship matrix (GRM). Thus, our aim was to compare scenarios where GEBV are computed and validated by using (1) either CB offspring averages or individual CB records for validation, (2) either a PB or CB reference population, and (3) a GRM that either accounts for or ignores BOA in the CB individuals. For this purpose, we used data on body weight measured at around 7 (BW7) or 35 (BW35) days in PB and CB broiler chickens and evaluated the accuracy of GEBV based on the correlation GEBV with phenotypes in the validation population (validation correlation).ResultsWith validation on CB offspring averages, the validation correlation of GEBV of PB animals for CB performance was lower with a CB reference population than with a PB reference population for BW35 (r_{pc} = 0.96), and about equal for BW7 (r_{pc} = 0.80) when BOA was ignored. However, with validation on individual CB records, the validation correlation was higher with a CB reference population for both traits. The use of a GRM that took BOA into account increased the validation correlation for BW7 but reduced it for BW35.ConclusionsWe argue that the benefit of using a CB reference population for genomic prediction of PB animals for CB performance should be assessed either by validation on CB offspring averages, or by validation on individual CB records while using a GRM that accounts for BOA in the CB individuals. With this recommendation in mind, our results show that the accuracy of GEBV of PB animals for CB performance was equal to or higher with a CB reference population than with a PB reference population for a trait with an r_{pc} of 0.8, but lower for a trait with an r_{pc} of 0.96. In addition, taking BOA into account was beneficial for a trait with an r_{pc} of 0.8 but not for a trait with an r_{pc} of 0.96.
- Research Article
8
- 10.1186/s12711-022-00752-4
- Sep 27, 2022
- Genetics, selection, evolution : GSE
BackgroundAlthough single-step GBLUP (ssGBLUP) is an animal model, SNP effects can be backsolved from genomic estimated breeding values (GEBV). Predicted SNP effects allow to compute indirect prediction (IP) per individual as the sum of the SNP effects multiplied by its gene content, which is helpful when the number of genotyped animals is large, for genotyped animals not in the official evaluations, and when interim evaluations are needed. Typically, IP are obtained for new batches of genotyped individuals, all of them young and without phenotypes. Individual (theoretical) accuracies for IP are rarely reported, but they are nevertheless of interest. Our first objective was to present equations to compute individual accuracy of IP, based on prediction error covariance (PEC) of SNP effects, and in turn, are obtained from PEC of GEBV in ssGBLUP. The second objective was to test the algorithm for proven and young (APY) in PEC computations. With large datasets, it is impossible to handle the full PEC matrix, thus the third objective was to examine the minimum number of genotyped animals needed in PEC computations to achieve IP accuracies that are equivalent to GEBV accuracies.ResultsCorrelations between GEBV and IP for the validation animals using SNP effects from ssGBLUP evaluations were ≥ 0.99. When all available genotyped animals were used for PEC computations, correlations between GEBV and IP accuracy were ≥ 0.99. In addition, IP accuracies were compatible with GEBV accuracies either with direct inversion of the genomic relationship matrix (G) or using the algorithm for proven and young (APY) to obtain the inverse of G. As the number of genotyped animals included in the PEC computations decreased from around 55,000 to 15,000, correlations were still ≥ 0.96, but IP accuracies were biased downwards.ConclusionsTheoretical accuracy of indirect prediction can be successfully obtained by computing SNP PEC out of GEBV PEC from ssGBLUP equations using direct or APY G inverse. It is possible to reduce the number of genotyped animals in PEC computations, but accuracies may be underestimated. Further research is needed to approximate SNP PEC from ssGBLUP to limit the computational requirements with many genotyped animals.
- Research Article
24
- 10.3168/jds.2013-7821
- Jul 3, 2014
- Journal of Dairy Science
Assigning unknown parent groups to reduce bias in genomic evaluations of final score in US Holsteins
- Research Article
18
- 10.1007/s00122-015-2464-6
- Mar 4, 2015
- Theoretical and Applied Genetics
We evaluated several methods for computing shrinkage estimates of the genomic relationship matrix and demonstrated their potential to enhance the reliability of genomic estimated breeding values of training set individuals. In genomic prediction in plant breeding, the training set constitutes a large fraction of the total number of genotypes assayed and is itself subject to selection. The objective of our study was to investigate whether genomic estimated breeding values (GEBVs) of individuals in the training set can be enhanced by shrinkage estimation of the genomic relationship matrix. We simulated two different population types: a diversity panel of unrelated individuals and a biparental family of doubled haploid lines. For different training set sizes (50, 100, 200), number of markers (50, 100, 200, 500, 2,500) and heritabilities (0.25, 0.5, 0.75), shrinkage coefficients were computed by four different methods. Two of these methods are novel and based on measures of LD, the other two were previously described in the literature, one of which was extended by us. Our results showed that shrinkage estimation of the genomic relationship matrix can significantly improve the reliability of the GEBVs of training set individuals, especially for a low number of markers. We demonstrate that the number of markers is the primary determinant of the optimum shrinkage coefficient maximizing the reliability and we recommend methods eligible for routine usage in practical applications.
- Research Article
17
- 10.1186/1753-6561-5-s3-s15
- May 27, 2011
- BMC Proceedings
BackgroundThe genomic estimated breeding values (GEBV) of the young individuals in the XIV QTL-MAS workshop dataset were predicted by three methods: best linear unbiased prediction with a trait-specific marker-derived relationship matrix (TABLUP), ridge regression best linear unbiased prediction (RRBLUP), and BayesB.MethodsThe TABLUP method is identical to the conventional BLUP except that the numeric relationship matrix is replaced with a trait-specific marker-derived relationship matrix (TA). The TA matrix was constructed based on both marker genotypes and their estimated effects on the trait of interest. The marker effects were estimated in a reference population consisting of 2 326 individuals using RRBLUP and BayesB. The GEBV of individuals in the reference population as well as 900 young individuals were estimated using the three methods. Subsets of markers were selected to perform low-density marker genomic selection for TABLUP method.ResultsThe correlations between GEBVs from different methods are over 0.95 in most scenarios. The correlations between BayesB using all markers and TABLUP using 200 or more selected markers to construct the TA matrix are higher than 0.98 in the candidate population. The accuracy of TABLUP is higher than 0.67 with 100 or more selected markers, which is nearly equal to the accuracy of BayesB with all markers.ConclusionsTABLUP method performed nearly equally to BayesB method with the common dataset. It also provides an alternative method to predict GEBV with low-density markers. TABLUP is therefore a promising method for genomic selection deserving further exploration.
- Research Article
66
- 10.3389/fpls.2019.01502
- Nov 22, 2019
- Frontiers in Plant Science
Genomic selection predicts the genomic estimated breeding values (GEBVs) of individuals not previously phenotyped. Several studies have investigated the accuracy of genomic predictions in maize but there is little empirical evidence on the practical performance of lines selected based on phenotype in comparison with those selected solely on GEBVs in advanced testcross yield trials. The main objectives of this study were to (1) empirically compare the performance of tropical maize hybrids selected through phenotypic selection (PS) and genomic selection (GS) under well-watered (WW) and managed drought stress (WS) conditions in Kenya, and (2) compare the cost–benefit analysis of GS and PS. For this study, we used two experimental maize data sets (stage I and stage II yield trials). The stage I data set consisted of 1492 doubled haploid (DH) lines genotyped with rAmpSeq SNPs. A subset of these lines (855) representing various DH populations within the stage I cohort was crossed with an individual single-cross tester chosen to complement each population. These testcross hybrids were evaluated in replicated trials under WW and WS conditions for grain yield and other agronomic traits, while the remaining 637 DH lines were predicted using the 855 lines as a training set. The second data set (stage II) consists of 348 DH lines from the first data set. Among these 348 best DH lines, 172 lines selected were solely based on GEBVs, and 176 lines were selected based on phenotypic performance. Each of the 348 DH lines were crossed with three common testers from complementary heterotic groups, and the resulting 1042 testcross hybrids and six commercial checks were evaluated in four to five WW locations and one WS condition in Kenya. For stage I trials, the cross-validated prediction accuracy for grain yield was 0.67 and 0.65 under WW and WS conditions, respectively. We found similar responses to selection using PS and GS for grain yield other agronomic traits under WW and WS conditions. The top 15% of hybrids advanced through GS and PS gave 21%–23% higher grain yield under WW and 51%–52% more grain yield under WS than the mean of the checks. The GS reduced the cost by 32% over the PS with similar selection gains. We concluded that the use of GS for yield under WW and WS conditions in maize can produce selection candidates with similar performance as those generated from conventional PS, but at a lower cost, and therefore, should be incorporated into maize breeding pipelines to increase breeding program efficiency.
- Research Article
9
- 10.1093/jas/skab004
- Feb 1, 2021
- Journal of animal science
The stability of genomic evaluations depends on the amount of data and population parameters. When the dataset is large enough to estimate the value of nearly all independent chromosome segments (~10K in American Angus cattle), the accuracy and persistency of breeding values will be high. The objective of this study was to investigate changes in estimated breeding values (EBV) and genomic EBV (GEBV) across monthly evaluations for 1 yr in a large genotyped population of beef cattle. The American Angus data used included 8.2 million records for birth weight, 8.9 for weaning weight, and 4.4 for postweaning gain. A total of 10.1 million animals born until December 2017 had pedigree information, and 484,074 were genotyped. A truncated dataset included animals born until December 2016. To mimic a scenario with monthly evaluations, 2017 data were added 1 mo at a time to estimate EBV using best linear unbiased prediction (BLUP) and GEBV using single-step genomic BLUP with the algorithm for proven and young (APY) with core group fixed for 1 yr or updated monthly. Predictions from monthly evaluations in 2017 were contrasted with the predictions of the evaluation in December 2016 or the previous month for all genotyped animals born until December 2016 with or without their own phenotypes or progeny phenotypes. Changes in EBV and GEBV were similar across traits, and only results for weaning weight are presented. Correlations between evaluations from December 2016 and the 12 consecutive evaluations were ≥0.97 for EBV and ≥0.99 for GEBV. Average absolute changes for EBV were about two times smaller than for GEBV, except for animals with new progeny phenotypes (≤0.12 and ≤0.11 additive genetic SD [SDa] for EBV and GEBV). The maximum absolute changes for EBV (≤2.95 SDa) were greater than for GEBV (≤1.59 SDa). The average(maximum) absolute GEBV changes for young animals from December 2016 to January and December 2017 ranged from 0.05(0.25) to 0.10(0.53) SDa. Corresponding ranges for animals with new progeny phenotypes were from 0.05(0.88) to 0.11(1.59) SDa for GEBV changes. The average absolute change in EBV(GEBV) from December 2016 to December 2017 for sires with ≤50 progeny phenotypes was 0.26(0.14) and for sires with >50 progeny phenotypes was 0.25(0.16) SDa. Updating the core group in APY without adding data created an average absolute change of 0.07 SDa in GEBV. Genomic evaluations in large genotyped populations are as stable and persistent as the traditional genetic evaluations, with less extreme changes.
- Research Article
24
- 10.1016/j.bjane.2017.01.011
- Jul 1, 2017
- Brazilian Journal of Anesthesiology (English Edition)
Importance of using basic statistics adequately in clinical research
- Dissertation
- 10.53846/goediss-3914
- Feb 20, 2022
Estimation of Genetic Parameters and Evaluation of Breeding Program Designs with a Focus on Dairy Cattle in Low Input Production Systems
- Single Report
- 10.32747/2015.7594404.bard
- Mar 1, 2015
The main objectives of this research was to detect the specific polymorphisms responsible for observed quantitative trait loci and develop optimal strategies for genomic evaluations and selection for moderate (Israel) and large (US) dairy cattle populations. A joint evaluation using all phenotypic, pedigree, and genomic data is the optimal strategy. The specific objectives were: 1) to apply strategies for determination of the causative polymorphisms based on the “a posteriori granddaughter design” (APGD), 2) to develop methods to derive unbiased estimates of gene effects derived from SNP chips analyses, 3) to derive optimal single-stage methods to estimate breeding values of animals based on marker, phenotypic and pedigree data, 4) to extend these methods to multi-trait genetic evaluations and 5) to evaluate the results of long-term genomic selection, as compared to traditional selection. Nearly all of these objectives were met. The major achievements were: The APGD and the modified granddaughter designs were applied to the US Holstein population, and regions harboring segregating quantitative trait loci (QTL) were identified for all economic traits of interest. The APGD was able to find segregating QTL for all the economic traits analyzed, and confidence intervals for QTL location ranged from ~5 to 35 million base pairs. Genomic estimated breeding values (GEBV) for milk production traits in the Israeli Holstein population were computed by the single-step method and compared to results for the two-step method. The single-step method was extended to derive GEBV for multi-parity evaluation. Long-term analysis of genomic selection demonstrated that inclusion of pedigree data from previous generations may result in less accurate GEBV. Major conclusions are: Predictions using single-step genomic best linear unbiased prediction (GBLUP) were the least biased, and that method appears to be the best tool for genomic evaluation of a small population, as it automatically accounts for parental index and allows for inclusion of female genomic information without additional steps. None of the methods applied to the Israeli Holstein population were able to derive GEBV for young bulls that were significantly better than parent averages. Thus we confirm previous studies that the main limiting factor for the accuracy of GEBV is the number of bulls with genotypes and progeny tests. Although 36 of the grandsires included in the APGD were genotyped for the BovineHDBeadChip, which includes 777,000 SNPs, we were not able to determine the causative polymorphism for any of the detected QTL. The number of valid unique markers on the BovineHDBeadChip is not sufficient for a reasonable probability to find the causative polymorphisms. Complete resequencing of the genome of approximately 50 bulls will be required, but this could not be accomplished within the framework of the current project due to funding constraints. Inclusion of pedigree data from older generations in the derivation of GEBV may result is less accurate evaluations.
- Research Article
26
- 10.1007/s11434-011-4632-7
- Aug 14, 2011
- Chinese Science Bulletin
Genomic selection (GS) is a marker-assisted selection method, in which high density markers covering the whole genome are used simultaneously for individual genetic evaluation via genomic estimated breeding values (GEBVs). GS can increase the accuracy of selection, shorten the generation interval by selecting individuals at the early stage of life, and accelerate genetic progress. With the availability of high density whole genome SNP (single nucleotide polymorphism) chips for livestock, GS is reshaping the conventional animal breeding systems. In many countries, GS is becoming the major genetic evaluation method for bull selection in dairy cattle and GS may soon completely replace the traditional genetic evaluation system. In recent years, GS has become an important research topic in animal, plant and aquiculture breeding and many exciting results have been reported. In this paper, the methods for obtaining GEBVs, factors affecting the accuracy of GEBVs, and the current status of implementation of GS in livestock are reviewed. Some unresolved issues related to GS in livestock are also discussed.
- Research Article
4
- 10.3168/jds.2022-23135
- Sep 9, 2023
- Journal of Dairy Science
The productivity of smallholder dairy farms is very low in developing countries. Important genetic gains could be realized using genomic selection, but genetic evaluations need to be tailored for lack of pedigree information and very small farm sizes. To accommodate this situation, we propose a flexible Bayesian model for the genetic evaluation of milk yield, which allows us to simultaneously account for nongenetic random effects for farms and varying SNP variance (BayesR model). First, we used simulations based on real genotype data from Indian crossbred dairy cattle to demonstrate that the proposed model can separate the true genetic and nongenetic parameters even for small farm sizes (2 cows on average) although with high standard errors in scenarios with low heritability. The accuracy of genomic genetic evaluation increased until farm size was approximately 5. We then applied the model to real data from 4,655 crossbred cows with 106,109 monthly test day milk records and 689,750 autosomal SNPs. We estimated a heritability of 0.16 (0.04) for milk yield and using cross-validation, a genomic estimated breeding value (GEBV) accuracy of 0.45 and bias (regression of phenotype on GEBV) of 1.04 (0.26). Estimated genetic parameters were very similar using BayesR, BayesC, and genomic BLUP approaches. Candidate genes near the top variants, IMMP2L and ARHGEF2, have been previously associated with milk protein composition, mastitis resistance, and milk cholesterol content. The estimated heritability and GEBV accuracy for milk yield are much lower than those from intensive or pasture-based systems in many countries. Further increases in the number of phenotyped and genotyped animals in farms with at least 2 cows (preferably 3–5, to allow for dropout of cows) are needed to improve the estimation of genetic effects in these smallholder dairy farms.
- Research Article
26
- 10.1016/j.bjan.2017.01.003
- Nov 1, 2017
- Brazilian Journal of Anesthesiology
Importance of using basic statistics adequately in clinical research
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.