Abstract

To date more than 3700 genome-wide association studies (GWAS) have been published that look at the genetic contributions of single nucleotide polymorphisms (SNPs) to human conditions or human phenotypes. Through these studies many highly significant SNPs have been identified for hundreds of diseases or medical conditions. However, the extent to which GWAS-identified SNPs or combinations of SNP biomarkers can predict disease risk is not well known. One of the most commonly used approaches to assess the performance of predictive biomarkers is to determine the area under the receiver-operator characteristic curve (AUROC). We have developed an R package called G-WIZ to generate ROC curves and calculate the AUROC using summary-level GWAS data. We first tested the performance of G-WIZ by using AUROC values derived from patient-level SNP data, as well as literature-reported AUROC values. We found that G-WIZ predicts the AUROC with <3% error. Next, we used the summary level GWAS data from GWAS Central to determine the ROC curves and AUROC values for 569 different GWA studies spanning 219 different conditions. Using these data we found a small number of GWA studies with SNP-derived risk predictors that have very high AUROCs (>0.75). On the other hand, the average GWA study produces a multi-SNP risk predictor with an AUROC of 0.55. Detailed AUROC comparisons indicate that most SNP-derived risk predictions are not as good as clinically based disease risk predictors. All our calculations (ROC curves, AUROCs, explained heritability) are in a publicly accessible database called GWAS-ROCS (http://gwasrocs.ca). The G-WIZ code is freely available for download at https://github.com/jonaspatronjp/GWIZ-Rscript/.

Highlights

  • MethodsA total of 3307 genome-wide association studies (GWAS) publications corresponding to 6137 GWAS summaries were collected in this manner

  • Over the past decade several approaches to calculating receiver operating characteristic (ROC) curves and area under the receiver-operator characteristic curve (AUROC) data from summary level single nucleotide polymorphisms (SNPs) data have appeared [8,9,10,11,12,13]. We found that these methods were not accurate, very limited in their capabilities or required more information than what was available in standard genome-wide association studies (GWAS) Central summary data. To overcome these issues we developed a novel approach to accurately generate ROC curves and to calculate the AUROC for different SNP combinations using the summary-level data that is standardly found in GWAS databases

  • We found that G-WIZ predicts the AUROC with

Read more

Summary

Methods

A total of 3307 GWAS publications corresponding to 6137 GWAS summaries were collected in this manner These GWA studies were further filtered based on the inclusion of an odds ratio (OR) and a risk allele frequency (RAF) for each reported SNP. An analysis of every study in GWAS Central indicated that roughly 70% of all published GWA studies (since 2009) have had sample sizes of greater than 1000 cases and 1000 controls While choosing this threshold should have left us with about 4300 studies to analyze, we found that many studies in GWAS Central were missing either the SNP odds ratios or the risk allele frequencies–or both, which prevented their use in our calculations. After applying these filters, we were left with a total of 569 GWAS Central studies, corresponding to 219 different conditions

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call