Abstract

We consider the feasibility of reusing existing control data obtained in genetic association studies in order to reduce costs for new studies. We discuss controlling for the population differences between cases and controls that are implicit in studies utilizing external control data. We give theoretical calculations of the statistical power of a test due to Bourgain et al (Am J Human Genet 2003), applied to the problem of dealing with case-control differences in genetic ancestry related to population isolation or population admixture. Theoretical results show that there may exist bounds for the non-centrality parameter for a test of association that places limits on study power even if sample sizes can grow arbitrarily large. We apply this method to data from a multi-center, geographically-diverse, genome-wide association study of breast cancer in African-American women. Our analysis of these data shows that admixture proportions differ by center with the average fraction of European admixture ranging from approximately 20% for participants from study sites in the Eastern United States to 25% for participants from West Coast sites. However, these differences in average admixture fraction between sites are largely counterbalanced by considerable diversity in individual admixture proportion within each study site. Our results suggest that statistical correction for admixture differences is feasible for future studies of African-Americans, utilizing the existing controls from the African-American Breast Cancer study, even if case ascertainment for the future studies is not balanced over the same centers or regions that supplied the controls for the current study.

Highlights

  • A genetic association study estimating the main effects of single nucleotide polymorphisms (SNPs) or other genetic variants upon the risk of a rare or common disease in minority populations is a setting in which it is especially attractive to consider the use of existing genotype data as a supplementary or even a primary source of controls

  • We examine empirically the false positive rates that occur when cases from one geographical location or study within the American Breast Cancer (AABC) study are combined with controls from other AABC locations or studies, as well as the success of adjustment for the observed population differences in global genetic ancestry when analyzing such illustrative data sets derived from the AABC study

  • We have adopted a somewhat non-standard approach in relying upon the Bourgain test rather than principal components [8] or related methods [22,23,24] to control for population structure in a genome-wide association study (GWAS) of a minority population with cases/controls drawn from multiple studies with different designs and recruitment approaches

Read more

Summary

Introduction

A genetic association study estimating the main effects of single nucleotide polymorphisms (SNPs) or other genetic variants upon the risk of a rare or common disease in minority populations is a setting in which it is especially attractive to consider the use of existing genotype data as a supplementary or even a primary source of controls. We are interested in whether studies where cases and controls are sampled differently will give correct answers and are as powerful statistically as when new control data is genotyped. Because of the huge investments made recently in large scale genotyping of cases and controls for various diseases, this is a timely question. This question is especially important in understanding the genetic causes of disease in as-yet relatively understudied population groups, such as African-Americans, in order to speed up progress when this is possible. We provide analysis of real data from a major study of the genetic causes of breast cancer in African-American women in order to shed practical light upon this issue

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call