Mean and variance heterogeneity loci impact kernel compositional traits in maize
Maize (Zea mays) kernel composition is critical for food, feed, and industrial applications. Improving traits such as starch, protein, oil, fiber, and ash requires understanding their genetic basis. We conducted genome‐wide association studies (GWAS) and variance genome‐wide association studies (vGWAS) analyses using 954 inbred lines from the USDA‐ARS North Central Regional Plant Introduction Station collection to identify loci influencing both trait means and variability. We detected 10 significant single nucleotide polymorphisms (SNPs) associated with five kernel traits, some of which colocalized with known genes such as waxy1 and gras7. vGWAS uncovered additional loci not detected by standard GWAS, highlighting its value as a complementary tool. Genomic selection models, including ridge‐regression best linear unbiased prediction, reproducing kernel Hilbert space, and random forest, achieved moderate prediction accuracies (0.41–0.55), with parametric and semi‐parametric models showing less prediction bias. Although our dataset was derived from unreplicated genebank seed, key findings, particularly for protein and starch, were consistent with results from replicated field trials, supporting the utility of genebank‐derived high‐quality samples for initial genomic analysis. These results highlight the potential for using existing seed resources and high‐throughput phenotyping to identify candidate loci and prioritize traits for future replicated validation.
- # North Central Regional Plant Introduction
- # Central Regional Plant Introduction Station
- # Standard Genome‐wide Association Studies
- # Genome‐wide Association Studies
- # Reproducing Kernel Hilbert Space
- # Significant Single Nucleotide Polymorphisms
- # Genomic Selection Models
- # Traits In Maize
- # Kernel Traits
- # Linear Unbiased Prediction
- Research Article
121
- 10.1371/journal.pone.0067066
- Jun 20, 2013
- PLoS ONE
Stalk strength is an important trait in maize (Zea mays L.). Strong stalks reduce lodging and maximize harvestable yield. Studies show rind penetrometer resistance (RPR), or the force required to pierce a stalk rind with a spike, is a valid approximation of strength. We measured RPR across 4,692 recombinant inbreds (RILs) comprising the maize nested association mapping (NAM) panel derived from crosses of diverse inbreds to the inbred, B73. An intermated B73×Mo17 family (IBM) of 196 RILs and a panel of 2,453 diverse inbreds from the North Central Regional Plant Introduction Station (NCRPIS) were also evaluated. We measured RPR in three environments. Family-nested QTL were identified by joint-linkage mapping in the NAM panel. We also performed a genome-wide association study (GWAS) and genomic best linear unbiased prediction (GBLUP) in each panel. Broad sense heritability computed on a line means basis was low for RPR. Only 8 of 26 families had a heritability above 0.20. The NCRPIS diversity panel had a heritability of 0.54. Across NAM and IBM families, 18 family-nested QTL and 141 significant GWAS associations were identified for RPR. Numerous weak associations were also found in the NCRPIS diversity panel. However, few were linked to loci involved in phenylpropanoid and cellulose synthesis or vegetative phase transition. Using an identity-by-state (IBS) relationship matrix estimated from 1.6 million single nucleotide polymorphisms (SNPs) and RPR measures from 20% of the NAM panel, genomic prediction by GBLUP explained 64±2% of variation in the remaining RILs. In the NCRPIS diversity panel, an IBS matrix estimated from 681,257 SNPs and RPR measures from 20% of the panel explained 33±3% of variation in the remaining inbreds. These results indicate the high genetic complexity of stalk strength and the potential for genomic prediction to hasten its improvement.
- Research Article
7
- 10.3390/ani13243871
- Dec 15, 2023
- Animals : an Open Access Journal from MDPI
Simple SummaryBy integrating prior biological information into genomic selection methods using appropriate models, it is possible to improve prediction accuracy for complex traits. In this context, we conducted a comparative assessment of two genomic prediction models, namely, genomic best linear unbiased prediction and genomic feature best linear unbiased prediction. The accuracy of these models in predicting the growth traits of backfat thickness and loin muscle area was evaluated. Our results revealed that the genomic feature best linear unbiased prediction model can effectively integrate prior information into the model, which is superior to the genomic best linear unbiased prediction model in some cases. These findings provide valuable ideas for enhancing the genomic prediction accuracy of growth traits in pigs.Enhancing the accuracy of genomic prediction is a key goal in genomic selection (GS) research. Integrating prior biological information into GS methods using appropriate models can improve prediction accuracy for complex traits. Genome-wide association study (GWAS) is widely utilized to identify potential candidate loci associated with complex traits in livestock and poultry, offering essential genomic insights. In this study, a GWAS was conducted on 685 Duroc × Landrace × Yorkshire (DLY) pigs to extract significant single-nucleotide polymorphisms (SNPs) as genomic features. We compared two GS models, genomic best linear unbiased prediction (GBLUP) and genomic feature BLUP (GFBLUP), by using imputed whole-genome sequencing (WGS) data on 651 Yorkshire pigs. The results revealed that the GBLUP model achieved prediction accuracies of 0.499 for backfat thickness (BFT) and 0.423 for loin muscle area (LMA). By applying the GFBLUP model with GWAS-based SNP preselection, the average prediction accuracies for BFT and LMA traits reached 0.491 and 0.440, respectively. Specifically, the GFBLUP model displayed a 4.8% enhancement in predicting LMA compared to the GBLUP model. These findings suggest that, in certain scenarios, the GFBLUP model may offer superior genomic prediction accuracy when compared to the GBLUP model, underscoring the potential value of incorporating genomic features to refine GS models.
- Research Article
29
- 10.1186/s12711-017-0338-x
- Aug 24, 2017
- Genetics, Selection, Evolution : GSE
BackgroundA quantitative trait is controlled both by major variants with large genetic effects and by minor variants with small effects. Genome-wide association studies (GWAS) are an efficient approach to identify quantitative trait loci (QTL), and genomic selection (GS) with high-density single nucleotide polymorphisms (SNPs) can achieve higher accuracy of estimated breeding values than conventional best linear unbiased prediction (BLUP). GWAS and GS address different aspects of quantitative traits, but, as statistical models, they are quite similar in their description of the genetic mechanisms that underlie quantitative traits.MethodsHere, we propose a stepwise linear regression mixed model (StepLMM) to unify GWAS and GS in a single statistical model. First, the variance components of the genomic-BLUP (GBLUP) model are estimated. Then, in the SNP selection step, the linear mixed model (LMM) for GWAS is equivalently transformed into a simple linear regression to improve computation speed, and the most significant SNP is selected and included into the evaluation model. In the SNP dropping step, the SNPs in the evaluation model are tested according to the standard errors of their estimated effects. If non-significant SNPs are present, the least significant one is dropped from the model and variance components are re-estimated. We used extended Bayesian information criteria (eBIC) to evaluate the model optimization, i.e. the model with the smallest eBIC is the final one and includes only significant SNPs.ResultsWe simulated scenarios with different heritabilities with 100 QTL. StepLMM estimated heritability accurately and mapped QTL precisely. Genomic prediction accuracy was much higher with StepLMM than with GBLUP. The comparison of StepLMM with other GWAS and GS methods based on a dataset from the 16th QTLMAS Workshop showed that StepLMM had medium mapping power, the lowest rate of false positives for QTL mapping, and the highest accuracy for genomic prediction.ConclusionsStepLMM is a combination of GWAS and GBLUP. GWAS and GBLUP are beneficial to each other in a single statistical model, GWAS improves genomic prediction accuracy, while GBLUP increases mapping precision and decreases the rate of false positives of GWAS. StepLMM has a high performance in both GWAS and GS and is feasible for agricultural breeding programs and human genetic studies.
- Research Article
1
- 10.21273/hortsci.40.4.1063a
- Jul 1, 2005
- HortScience
Echinacea is becoming a well-established, high-value crop, both as an ornamental and a dietary supplement. A comprehensive collection of Echinacea germplasm is conserved by the USDA-ARS North Central Regional Plant Introduction Station (NCRPIS) in Ames, Iowa, and is available via seed distribution for research and educational purposes (ars-grin.gov/npgs). Representing all nine species collected throughout their respective North American geographic ranges, the Echinacea collection includes 179 accessions. Extensive morphological characterization data associated with this collection have been compiled and are available to researchers on the Germplasm Resources Information Network (GRIN) database to aid in selection criteria. The collection has been used extensively for various research projects, ranging from ornamental breeding studies to HPLC analyses of metabolites of interest to the phytopharmaceutical industry. This poster will summarize the Echinacea collection conserved at the NCRPIS, including a list of available accessions by species, illustrations of seed, and control-pollinated cage propagation methods; and facilities utilized for seed cleaning, testing, and storage. In addition, instructions on how to use the GRIN database to view evaluation data and acquire germplasm will be provided.
- Research Article
14
- 10.1023/a:1023575227094
- May 1, 2003
- Genetic Resources and Crop Evolution
Understanding the patterns of distribution of plant genetic resources, especially the extent and contextual bases of distributions, may be critical in setting appropriate targets for seed multiplication, packaging, storage space, and other technical operations. We analyzed germplasm distribution patterns over a 12-year period for 10 crop collections conserved by the North Central Regional Plant Introduction Station in Ames, Iowa, to determine if distribution rates over a given time interval help predict future distributions and to document how distribution patterns vary among accessions within collections. We demonstrated that, with an appropriate tracking system and commonly available statistical software, germplasm distribution patterns can be easily analyzed and plotted over time. Data measured over periods of up to 3 years had little predictive value, while a 6-year period gave relatively accurate projections of future distributions. Patterns of distributions within collections varied between those that are approximately normally distributed and those best described by an exponential function, with larger collections tending to be non-normally distributed. Means and standard deviations of standardized, long-term distribution rates, calculated from samples of 200-700 accessions, accurately described the distributional rates of 90-95% of all accessions. The documentation of changes in usage patterns within and among collections as they mature is also discussed. Analysis of average shipment size suggests that germplasm distributions became more focused over time for 8 of the 10 collections analyzed. This may result when users request germplasm based upon knowledge about specific accessions gained through personal experience and by examining evaluation and characterization data.
- Discussion
4
- 10.1016/j.jhep.2022.10.032
- Nov 10, 2022
- Journal of Hepatology
Assessing causal relationship between non-alcoholic fatty liver disease and risk of atrial fibrillation
- Research Article
9
- 10.1007/s11032-023-01423-y
- Nov 1, 2023
- Molecular Breeding : New Strategies in Plant Improvement
Accurately identifying varieties with targeted agronomic traits was thought to contribute to genetic selection and accelerate rice breeding progress. Genomic selection (GS) is a promising technique that uses markers covering the whole genome to predict the genomic-estimated breeding values (GEBV), with the ability to select before phenotypes are measured. To choose the appropriate GS models for breeding work, we analyzed the predictability of nine agronomic traits measured from a population of 459 diverse rice varieties. By the comparison of eight representative GS models, we found that the prediction accuracies ranged from 0.407 to 0.896, with reproducing kernel Hilbert space (RKHS) having the highest predictive ability in most traits. Further results demonstrated the predictivity of GS is altered by several factors. Moreover, we assessed the method of integrating genome-wide association study (GWAS) into various GS models. The predictabilities of GS combined peak-associated markers generated from six different GWAS models were significantly different; a recommendation of Mixed Linear Model (MLM)-RKHS was given for the GWAS-GS-integrated prediction. Finally, based on the above result, we experimented with applying the P-values obtained from optimal GWAS models into ridge regression best linear unbiased prediction (rrBLUP), which benefited the low predictive traits in rice.
- Discussion
19
- 10.1016/s1474-4422(22)00395-7
- Oct 18, 2022
- The Lancet Neurology
Diabetes and Alzheimer's disease: shared genetic susceptibility?
- Research Article
57
- 10.1519/jsc.0000000000003259
- Sep 1, 2019
- Journal of Strength and Conditioning Research
Pickering, C, Suraci, B, Semenova, EA, Boulygina, EA, Kostryukova, ES, Kulemin, NA, Borisov, OV, Khabibova, SA, Larin, AK, Pavlenko, AV, Lyubaeva, EV, Popov, DV, Lysenko, EA, Vepkhvadze, TF, Lednev, EM, Leońska-Duniec, A, Pająk, B, Chycki, J, Moska, W, Lulińska-Kuklik, E, Dornowski, M, Maszczyk, A, Bradley, B, Kana-ah, A, Cięszczyk, P, Generozov, EV, and Ahmetov, II. A genome-wide association study of sprint performance in elite youth football players. J Strength Cond Res 33(9): 2344-2351, 2019-Sprint speed is an important component of football performance, with teams often placing a high value on sprint and acceleration ability. The aim of this study was to undertake the first genome-wide association study to identify genetic variants associated with sprint test performance in elite youth football players and to further validate the obtained results in additional studies. Using micro-array data (600 K-1.14 M single nucleotide polymorphisms [SNPs]) of 1,206 subjects, we identified 12 SNPs with suggestive significance after passing replication criteria. The polymorphism rs55743914 located in the PTPRK gene was found as the most significant for 5-m sprint test (p = 7.7 × 10). Seven of the discovered SNPs were also associated with sprint test performance in a cohort of 126 Polish women, and 4 were associated with power athlete status in a cohort of 399 elite Russian athletes. Six SNPs were associated with muscle fiber type in a cohort of 96 Russian subjects. We also examined genotype distributions and possible associations for 16 SNPs previously linked with sprint performance. Four SNPs (AGT rs699, HSD17B14 rs7247312, IGF2 rs680, and IL6 rs1800795) were associated with sprint test performance in this cohort. In addition, the G alleles of 2 SNPs in ADRB2 (rs1042713 & rs1042714) were significantly over-represented in these players compared with British and European controls. These results suggest that there is a genetic influence on sprint test performance in footballers, and identifies some of the genetic variants that help explain this influence.
- Research Article
22
- 10.1161/circgenetics.108.843946
- Apr 1, 2009
- Circulation: Cardiovascular Genetics
The sequencing of the human genome, the identification of common single-nucleotide polymorphisms (SNPs) and haplotype blocks, and advances in microarray technology have enabled the study of complex diseases at a level of detail not previously imaginable. These have aided in the design and analyses of association and linkage studies of many complex diseases including cardiovascular disease. Recent technological advances have enabled the undertaking of large-scale genome-wide association studies (GWAS) that can assay hundreds of thousands of polymorphic sites on hundreds to thousands of individuals to find genomic regions associated with disease. Although results from these experiments enable the identification of smaller regions of association compared with previous studies, as with all linkage and association studies, there is the need for the further investigation of regions of interest for the causal genes or variants. The purpose of this review is to present a detailed demonstration as to how publicly available resources can be used to easily guide more detailed research into genomic regions of interest identified in linkage and association study data. Large-scale projects, such as the Human Genome Sequencing project,1,2 have generated large volumes and varieties of annotated genomic data necessitating the development of Internet-based tools to organize and make practically available these public data. One important tool in human disease research is the web-based graphical genome browsers that use the human genome sequence as the framework on which to organize genomic annotations, providing various ways for researchers to view and extract important information. Currently, there are 3 human genome browsers that have been developed for public use: (1) the National Center for Biotechnology Information (NCBI) Map Viewer3; (2) the University of California Santa Cruz (UCSC) Genome Browser4; and (3) the European Bioinformatics Institute’s Ensembl system.5 Although these genome browsers share common features and …
- Research Article
8
- 10.1109/access.2020.3002923
- Jan 1, 2020
- IEEE Access
SAERMA: Stacked Autoencoder Rule Mining Algorithm for the Interpretation of Epistatic Interactions in GWAS for Extreme Obesity
- Research Article
23
- 10.3389/fpls.2021.690059
- Jul 15, 2021
- Frontiers in Plant Science
High yield is the primary objective of maize breeding. Genomic dissection of grain yield and yield-related traits contribute to understanding the yield formation and improving the yield of maize. In this study, two genome-wide association study (GWAS) methods and genomic prediction were made on an association panel of 309 inbred lines. GWAS analyses revealed 22 significant trait–marker associations for grain yield per plant (GYP) and yield-related traits. Genomic prediction analyses showed that reproducing kernel Hilbert space (RKHS) outperformed the other four models based on GWAS-derived markers for GYP, ear weight, kernel number per ear and row, ear length, and ear diameter, whereas genomic best linear unbiased prediction (GBLUP) showed a slight superiority over other modes in most subsets of the trait-associated marker (TAM) for thousand kernel weight and kernel row number. The prediction accuracy could be improved when significant single-nucleotide polymorphisms were fitted as the fixed effects. Integrating information on population structure into the fixed model did not improve the prediction performance. For GYP, the prediction accuracy of TAMs derived from fixed and random model Circulating Probability Unification (FarmCPU) was comparable to that of the compressed mixed linear model (CMLM). For yield-related traits, CMLM-derived markers provided better accuracies than FarmCPU-derived markers in most scenarios. Compared with all markers, TAMs could effectively improve the prediction accuracies for GYP and yield-related traits. For eight traits, moderate- and high-prediction accuracies were achieved using TAMs. Taken together, genomic prediction incorporating prior information detected by GWAS could be a promising strategy to improve the grain yield of maize.
- Conference Article
4
- 10.1109/icmla.2018.00012
- Dec 1, 2018
Extensive genetic and phenotypic research is necessary for any effective plant breeding program. Such studies, however, require an immense amount of time and resources. In order to expedite the breeding process, we provide a novel method for rapid genotype prediction using in-situ images of plants. In this method, significant single nucleotide polymorphisms (SNPs) are first identified using a novel autoencoder framework with the goal of being more robust to false positive associations than standard genome wide association studies (GWAS). On-field images of various plant varieties are then used to train Convolutional Neural Networks (CNNs) to predict candidate alleles and validate phenotypic relationships. This image-based system allows for easy use on new plant varieties to gain real-time genetic information for better harvest prediction. The feasibility of our method for rapid genotype prediction was demonstrated on 345 Sorghum bicolor varieties with corresponding uncontrolled images 60 days after seed planting. Our autoencoder identified 4 significant SNPs that had an average allele classification accuracy of 70.58% on 68 previously unseen plant varieties.
- Research Article
2
- 10.1002/aro2.87
- Oct 1, 2024
- Animal Research and One Health
Spotted sea bass (Lateolabrax maculatus) is a species of significant economic importance in aquaculture. However, genetic degeneration, such as declining growth performance, has severely impeded industry development, necessitating urgent genetic improvement. Here, we conducted a genome‐wide association study (GWAS) and genomic prediction for growth traits using insertion and deletion (InDel) markers, and systematically compared the results with our previous studies using single nucleotide polymorphism (SNP) markers. A total of 97 significant InDels including a 6 bp insertion in an exon region were identified. It is worth noting that only 5 and 1 candidate genes for DY and TS populations were also detected in previous GWAS using SNPs, and numerous novel genes including c4b, fgf4, and dnajb9 were identified as vital candidate genes. Moreover, several novel growth‐related procedures, such as the growth and development of the bone and muscle, were also detected. These findings indicated that InDel‐based GWAS can provide valuable complement to SNP‐based studies. The comparison of genomic predictive performance for total length trait under different marker selection strategies and genomic selection models indicated that GWAS selection strategy exhibits more stable predictive performance compared to the evenly selection strategy. Additionally, support vector machine model demonstrated better predictive accuracy and efficiency than traditional best linear unbiased prediction and Bayes models. Furthermore, the superior predictive performance using InDel markers compared to SNP markers highlighted the potential of InDels to enhance genomic predictive accuracy and efficiency. Our results carry significant implications for dissecting genetic mechanisms and contributing genetic improvement of growth traits in spotted sea bass through genomic resources.
- Research Article
33
- 10.3389/fpls.2020.593897
- Nov 27, 2020
- Frontiers in Plant Science
Genomic selection models were investigated to predict several complex traits in breeding populations of Zea mays L. and Eucalyptus globulus Labill. For this, the following methods of Machine Learning (ML) were implemented: (i) Deep Learning (DL) and (ii) Bayesian Regularized Neural Network (BRNN) both in combination with different hyperparameters. These ML methods were also compared with Genomic Best Linear Unbiased Prediction (GBLUP) and different Bayesian regression models [Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression, Bayesian LASSO, and Reproducing Kernel Hilbert Space (RKHS)]. DL models, using Rectified Linear Units (as the activation function), had higher predictive ability values, which varied from 0.27 (pilodyn penetration of 6 years old eucalypt trees) to 0.78 (flowering-related traits of maize). Moreover, the larger mini-batch size (100%) had a significantly higher predictive ability for wood-related traits than the smaller mini-batch size (10%). On the other hand, in the BRNN method, the architectures of one and two layers that used only the pureline function showed better results of prediction, with values ranging from 0.21 (pilodyn penetration) to 0.71 (flowering traits). A significant increase in the prediction ability was observed for DL in comparison with other methods of genomic prediction (Bayesian alphabet models, GBLUP, RKHS, and BRNN). Another important finding was the usefulness of DL models (through an iterative algorithm) as an SNP detection strategy for genome-wide association studies. The results of this study confirm the importance of DL for genome-wide analyses and crop/tree improvement strategies, which holds promise for accelerating breeding progress.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.