Abstract

Recent studies have suggested that a large number of common variants may, in aggregate, underlie a substantial proportion of the heritability of complex traits such as schizophrenia, multiple sclerosis and height, even though the effect of each individual variant is typically very small (1-5). A persistent concern for population-based genome-wide association studies (GWAS), however, is that subtle population stratification could lead to bias. Although the abovementioned studies adhered to “best-practice” for controlling bias in GWAS, some authors have speculated, with respect to the International Schizophrenia Consortium’s schizophrenia study, that “cryptic population stratification could substantially affect [the results]” (6). Here, we repeat the analysis described in (1) utilizing a family-based sample, to ask directly whether this is in fact a viable contention, as family-based designs, by definition, control for “cryptic population stratification” completely. We analyzed GWAS data on 694 parent-offspring schizophrenia trios from Bulgaria. Of these probands, 360 were cases in the original ISC manuscript. All samples were ascertained and genotyped on Affymetrix 6.0 arrays following the protocols described in (1), with the exception that we further required SNPs to have greater than 99% genotyping and no more than 1 Mendel error. The original analysis in (1) created “scores” in case/control target samples, where the score per individual was a weighted sum of “risk alleles”, with the weights and the alleles determined by standard tests of association in an independent “discovery” case/control sample. Here, we designated the entire ISC sample (excluding all Bulgarian individuals) as the “discovery” sample; our target comparison was between the transmitted and untransmitted alleles of the Bulgarian trios. In other words, we asked whether putative risk alleles from a case/control study tend to, on average, be over-transmitted to offspring with schizophrenia. This analysis does not constitute an independent replication of (1) because of the overlapping cases; rather, our current purpose is solely to exclude bias due to cryptic population stratification as a possible source of inflated type I error. Although within-family association statistics are free from bias due to population stratification, they are in fact susceptible to certain technical biases that do not impact population-based studies, arising from non-random genotyping error (7). Particularly for low frequency variants, heterozygotes are more likely to be misclassified as the common than the rare homozygote. In family-based studies, this leads to a bias in which the common allele is over-transmitted. This is because, if only one parent is heterozygous and transmits the minor allele, a miscall indicating the common homozygote in the parent, will result in a Mendel error whereas the same error in the offspring will result in an apparent transmission of the common allele. We observed this phenomenon in our total trio dataset: using the transmission disequilibrium test (TDT) (8) the mean log of the odds ratio for the minor allele is -.004, which is significantly different from 0 (p 300 out of a total of 525,571 SNPs). We also removed all SNPs with less than complete genotyping (because SNPs with lower call rates tend to have more miscalling errors), or a minor allele frequency less than 2%. The resulting dataset showed no significant bias (P=.34). We designated “score alleles” in the discovery sample (2,794 cases, and 2,976 controls from (1)) for a subset of 45,544 SNPs selected to be in approximate linkage equilibrium, preferentially retaining SNPs with higher association in the discovery sample. In the target sample we calculated the weighted sum of “score alleles” at various discovery p-value thresholds (P<0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5) for the transmitted and untransmitted alleles. Figure 1 shows the results of the primary analysis, in which we compared scores between transmitted and untransmitted chromosomes using logistic regression as in (1). At every discovery P-value threshold, transmitted chromosomes had significantly higher rates of the score alleles, (i.e. alleles that were more common in cases compared to controls in the independent discovery analysis). The estimate of variance explained (Nagelkerke’s pseudo-R2) by the observed score reached ~5%, that is similar to values from (1). We confirmed these results using a 1-sided t-test on the difference in score between matched pairs of transmitted and untransmitted chromosomes (data not shown). Figure 1 Variance explained by p-value threshold, corresponding significance at each threshold (p=0.0044, 5.4×10-06, 3.2×10-08, 5.1×10-10, 1.2×10-11, 2.2×10-12, 1.03×10-12) In summary, individuals with schizophrenia from distinct European populations show enrichment across a very large number of SNPs for the same sets of common alleles. As previously discussed (1), this observation is consistent with a highly polygenic model of disease risk involving causal common variation. Further to the arguments already presented in (1), we can reject cryptic population stratification as a viable alternative explanation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.