Abstract
BackgroundSelection of an appropriate statistical significance threshold in genome-wide association studies is critical to differentiate true positives from false positives and false negatives. Different multiple testing comparison methods have been developed to determine the significance threshold; however, these methods may be overly conservative and may lead to an increase in false negatives. Here, we developed an empirical formula to determine the statistical significance threshold that is based on the marker-based heritability of the trait. To develop a formula for a significance threshold, we used 45 simulated traits in soybean, maize, and rice that varied in both broad sense heritability and the number of QTLs.ResultsA formula to determine a significance threshold was developed based on a regression equation that used one independent variable, marker-based heritability, and one response variable, − log10 (P)-values. For all species, the threshold –log10 (P)-values increased as both marker-based and broad-sense heritability increased. Higher broad sense heritability in these crops resulted in higher significant threshold values. Among crop species, maize, with a lower linkage disequilibrium pattern, had higher significant threshold values as compared to soybean and rice.ConclusionsOur formula was less conservative and identified more true positive associations than the false discovery rate and Bonferroni correction methods.
Highlights
Selection of an appropriate statistical significance threshold in genome-wide association studies is critical to differentiate true positives from false positives and false negatives
In this study, we developed a method to determine the significant threshold value for genome-wide association studies (GWAS) using the 45 simulated phenotypic traits that varied in both the broad sense heritability and the number of Quantitative trait loci (QTLs) in three crop species that differed in their linkage disequilibrium (LD) patterns
We repeated the simulation of these traits 10 times so that simulated QTLs were randomly assigned to different parts of the genome in order to obtain unbiased results
Summary
Selection of an appropriate statistical significance threshold in genome-wide association studies is critical to differentiate true positives from false positives and false negatives. Since the publication of MLM for GWAS [3], many MLM-based methods have been developed All these methods are single-locus, which test one marker at a time, and these methods fail to match the true genetic model of complex traits that are controlled by many loci simultaneously. To overcome this problem, multi-locus models, including FASTmrEMMAa [5], ISIS EMBLASSO [6], pLARmEB [7], pKWmEB [8], LASSO [9], and FarmCPU [10], have been developed
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.