Abstract

BackgroundGenome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset.MethodsTo demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data.ResultsOur experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset.ConclusionsFrom the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Highlights

  • Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; identifying single nucleotide polymorphism (SNP) interactions at the genomewide scale is limited due to computational and statistical challenges

  • We demonstrated how the Efficient Survival Multifactor Dimensionality Reduction (ESMDR) method improved on the efficiency of Surv-MDR and allowed for adjustment of covariate effects to analyze large-scale survival and genetic data to analyze age of disease-onset in association with SNP interactions

  • To analyze the effectiveness of the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, we evaluated our approach using the genome-wide genotyped lung cancer OncoArray-TRICL (Transdisciplinary Research Into Cancer of the Lung) Consortium data to detect and characterize SNP interactions that were associated with lung cancer age-of-onset

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; identifying single nucleotide polymorphism (SNP) interactions at the genomewide scale is limited due to computational and statistical challenges. Genome-wide association studies (GWAS) that used single-locus models by testing each single nucleotide polymorphism (SNP) for association with a phenotype, proved to be instrumental in identifying thousands of genetic variants associated with human traits and disorders [1,2,3,4]. Epistasis detection faces computational and statistical challenges in analyzing high-dimensional data and in testing millions of interaction models from an exhaustive search in GWAS [6, 12]. If the genotypic combinations that confer risk are nonadditive, finding the combinations of genotypes that increase risk can become a complex combinatorial challenge [7]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call