Abstract

Background: There is increasing interest in investigating genetic risk models in empirical studies, but such studies are premature when the expected predictive ability of the risk model is low. We assessed how accurately the predictive ability of genetic risk models can be estimated in simulated data that are created based on the odds ratios (ORs) and frequencies of single-nucleotide polymorphisms (SNPs) obtained from genome-wide association studies (GWASs).Methods: We aimed to replicate published prediction studies that reported the area under the receiver operating characteristic curve (AUC) as a measure of predictive ability. We searched GWAS articles for all SNPs included in these models and extracted ORs and risk allele frequencies to construct genotypes and disease status for a hypothetical population. Using these hypothetical data, we reconstructed the published genetic risk models and compared their AUC values to those reported in the original articles.Results: The accuracy of the AUC values varied with the method used for the construction of the risk models. When logistic regression analysis was used to construct the genetic risk model, AUC values estimated by the simulation method were similar to the published values with a median absolute difference of 0.02 [range: 0.00, 0.04]. This difference was 0.03 [range: 0.01, 0.06] and 0.05 [range: 0.01, 0.08] for unweighted and weighted risk scores.Conclusions: The predictive ability of genetic risk models can be estimated using simulated data based on results from GWASs. Simulation methods can be useful to estimate the predictive ability in the absence of empirical data and to decide whether empirical investigation of genetic risk models is warranted.

Highlights

  • Empirical studies on genetic risk models for multifactorial diseases so far show that the predictive ability is moderate at best (Willems et al, 2011; Husing et al, 2012), with a few promising exceptions (Maller et al, 2006; Romanos et al, 2009)

  • When logistic regression analysis was used to construct the genetic risk model, area under the receiver-operating characteristic curve (AUC) values estimated by the simulation method were similar to the published values with a median absolute difference of 0.02 [range: 0.00, 0.04]

  • The predictive ability of genetic risk models can be estimated using simulated data based on results from genome-wide association studies (GWASs)

Read more

Summary

Introduction

Empirical studies on genetic risk models for multifactorial diseases so far show that the predictive ability is moderate at best (Willems et al, 2011; Husing et al, 2012), with a few promising exceptions (Maller et al, 2006; Romanos et al, 2009). These methods all assess the predictive ability as the degree to which the risk model discriminates between patients and nonpatients, quantified as the area under the receiver operating characteristic (ROC) curve (AUC) Using epidemiological parameters such as a population-average risk of disease and the odds ratios (ORs) and frequencies of the genetic variants in the model, these methods obtain the AUC by simulating a dataset for a hypothetical population (Janssens et al, 2006; Pepe et al, 2010) or by using analytical formulas (Gail, 2008; Lu and Elston, 2008; Moonesinghe et al, 2010). We assessed how accurately the predictive ability of genetic risk models can be estimated in simulated data that are created based on the odds ratios (ORs) and frequencies of single-nucleotide polymorphisms (SNPs) obtained from genome-wide association studies (GWASs)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call