Abstract

Epidemiologic study designs represent a major challenge for genome-wide association studies. Most such studies to date have selected controls from the pool of participants without the disease of interest at the end of the study time. These choices can lead to biased estimates of exposure effects. Using data from the Framingham Heart Study (Genetic Analysis Workshop 16 Problem 2), we evaluate the impact on genetic association estimates for designs with control selection based on status at the end of a study (case exclusion (CE) sampling) to control selection based on incidence density (ID) sampling, when controls are selected from the pool of participants who are disease-free at the time a case is diagnosed. Cases are defined as those diagnosed with type 2 diabetes (T2D). We estimated odds ratios for 18 previously confirmed T2D variants using 189 cases selected by ID sampling and using 231 cases selected by CE sampling. We found none of these single-nucleotide polymorphisms to be significantly associated with T2D using either design. Because these empirical analyses were based on a small number of cases and on single-nucleotide polymorphisms with likely small effect sizes, we supplemented this work with simulated data sets of 500 cases from each strategies across a variety of allele frequencies and effect sizes. In our simulated datasets, we show ID sampling to be less biased than CE, although CE shows apparent increased power due to the upward bias of point estimates. We conclude that ID sampling is an appropriate option for genome-wide association studies.

Highlights

  • The genetic architecture of type 2 diabetes (T2D) appears to be composed of several genes, each of which has a modest impact on disease risk

  • In order to maximize precision, we chose a ratio of 10 controls per case for both sampling strategies in the FHS data

  • Because we did not have exact dates and body mass index (BMI) at onset of diabetes, we used the age at enrollment, i.e., the age at Visit 1, and BMI at enrollment to match cases and controls

Read more

Summary

Background

The genetic architecture of type 2 diabetes (T2D) appears to be composed of several genes, each of which has a modest impact on disease risk. The common method of control selection used for many GWAS is to form a single pool of potential controls consisting of subjects who were not cases by the end of the study period This method has been shown by Greenland and Thomas [1] and Lubin and Gail [2] to lead to biased estimates of the rate ratio. Differences in the origin of populations of cases and controls can arise if the two groups are recruited independently or have different inclusion criteria, and the presence of population stratification can lead to greater than nominal type I error rate Another method of control selection, termed “incidence density sampling”, uses subjects who survived to the time of case occurrence to make a pool of potential controls for each case. We use the GWAS data from the Framingham Heart Study (FHS, Genetic Analysis Workshop 16 Problem 2) to compare the influence of control selection on the results for T2D

Methods
Results and discussion
Frayling TM
Rubin W

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.