Abstract

Although single-locus approaches have been widely applied to identify disease-associated single-nucleotide polymorphisms (SNPs), complex diseases are thought to be the product of multiple interactions between loci. This has led to the recent development of statistical methods for detecting statistical interactions between two loci. Canonical correlation analysis (CCA) has previously been proposed to detect gene–gene coassociation. However, this approach is limited to detecting linear relations and can only be applied when the number of observations exceeds the number of SNPs in a gene. This limitation is particularly important for next-generation sequencing, which could yield a large number of novel variants on a limited number of subjects. To overcome these limitations, we propose an approach to detect gene–gene interactions on the basis of a kernelized version of CCA (KCCA). Our simulation studies showed that KCCA controls the Type-I error, and is more powerful than leading gene-based approaches under a disease model with negligible marginal effects. To demonstrate the utility of our approach, we also applied KCCA to assess interactions between 200 genes in the NF-κB pathway in relation to ovarian cancer risk in 3869 cases and 3276 controls. We identified 13 significant gene pairs relevant to ovarian cancer risk (local false discovery rate <0.05). Finally, we discuss the advantages of KCCA in gene–gene interaction analysis and its future role in genetic association studies.

Highlights

  • Genome-wide association studies (GWAS) have identified hundreds of loci that harbor genetic variants that influence predisposition to a particular phenotype

  • As the poor performance of canonical correlation analysis8 (CCA) is likely due to issues with overfitting, we considered additional simulations where sample sizes were fixed at 1000 and the number of markers per gene was set to values of 10, 20, 30, 40, or 50, randomly subsetted from the total number of markers used in our simulations

  • It is important to note that we have shown via our simulation results that the performance of the gene-level interaction analyses using Kernelized version of CCA (KCCA) is more powerful than other current methods under an interaction-only model, additional simulations have shown that the PC-based logistic regression20 (PC-LR) approach performs best in the presence of marginal effects (Figure 4)

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have identified hundreds of loci that harbor genetic variants that influence predisposition to a particular phenotype. One strategy for multi-locus modeling is to jointly model the effects all SNPs within a given gene (eg, multivariable logistic regression models) This approach may lack power as the degrees of freedom of the model could be large and may require filtering or shrinkage approaches. Another drawback to the joint modeling of multiple SNPs within a gene is possible model fitting issues due to multicollinearity between SNPs (ie, linkage disequilibrium (LD)), as well as the lack of inclusion of LD information in the analysis. The results of such SNP–SNP interaction analyses lack clear biological interpretability

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call