Large numbers of control individuals with genome-wide genotype data are now available through various databases. These controls are regularly used in case-control genome-wide association studies (GWAS) to increase the statistical power. Controls are often "unselected" for the disease of interest and are not matched to cases in terms of confounding factors, making the studies more vulnerable to confounding as a result of population stratification. In this communication, we demonstrate that family-based designs can integrate unselected controls from other studies into the analysis without compromising the robustness of family-based designs against genetic confounding. The result is a hybrid case-control family-based analysis that achieves higher power levels than population-based studies with the same number of cases and controls. This strategy is widely applicable and works ideally for all situations in which both family and case-control data are available. The approach consists of three steps. First, we perform a standard family-based association test that does not utilize the between-family component. Second, we use the between-family information in conjunction with the genotypes from unselected controls in a Cochran-Armitage trend test. The p values from this step are then calculated by rank ordering the individual Cochran-Armitage trend test statistics for the genotype markers. Third, we generate a combined p value with the association p values from the first two steps. Simulation studies are used to assess the achievable power levels of this method compared to standard analysis approaches. We illustrate the approach by an application to a GWAS of attention deficit hyperactivity disorder parent-offspring trios and publicly available controls.
Read full abstract