We describe implementation of a set-based method to assess the significance of findings from genomewide association study data. Our method, implemented in PLINK, is based on theoretical approximation of Fisher's statistics such that the combination of P-vales at a gene or across a pathway is carried out in a manner that accounts for the correlation structure, or linkage disequilibrium, between single nucleotide polymorphisms. We compare our method to a permutation-based product of P-values approach and show a typical correlation in excess of 0.98 for a number of comparisons. The method gives Type I error rates that are less than or equal to the corresponding nominal significance levels, making it robust to the effects of false positives. We show that in broadly similar populations, reference data sets of markers are an appropriate substrate for deriving marker-marker linkage disequilibrium (LD), negating the need to access individual level genotypes, greatly facilitating its generic applicability. We show that the method is thus robust to LD-associated bias and has equivalent performance to permutation-based methods, with a significantly shorter runtime. This is particularly relevant at a time of increasing public availability of significantly larger genetic data sets and should go a long way to assist in the rapid analysis of these data sets.
Read full abstract