Case-control Genome-Wide Association Studies (GWAS) provide a rich resource for studying the genetic architecture of complex diseases. A key is to elucidate how the genetic effects vary by the environment, what is traditionally defined by Gene-Environment interactions (GxE). The overlooked complication is that multiple, distinct pathophysiologic mechanisms may lead to the same clinical diagnosis and often these mechanisms have distinct genetic bases. In this paper, we first show that using the clinically diagnosed status can lead to severely biased estimates of GxE interactions in situations when the frequency of the pathologic diagnosis of interest, as compared to other diagnoses, depends on the environment. We then propose a pseudo-likelihood solution to correct the bias. Finally, we demonstrate our method in extensive simulations and in a GWAS of Alzheimer’s disease.
Read full abstract