Abstract
BackgroundGenome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset.MethodsTo demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data.ResultsOur experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset.ConclusionsFrom the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.
Highlights
Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; identifying single nucleotide polymorphism (SNP) interactions at the genomewide scale is limited due to computational and statistical challenges
We demonstrated how the Efficient Survival Multifactor Dimensionality Reduction (ESMDR) method improved on the efficiency of Surv-MDR and allowed for adjustment of covariate effects to analyze large-scale survival and genetic data to analyze age of disease-onset in association with SNP interactions
To analyze the effectiveness of the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, we evaluated our approach using the genome-wide genotyped lung cancer OncoArray-TRICL (Transdisciplinary Research Into Cancer of the Lung) Consortium data to detect and characterize SNP interactions that were associated with lung cancer age-of-onset
Summary
Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; identifying single nucleotide polymorphism (SNP) interactions at the genomewide scale is limited due to computational and statistical challenges. Genome-wide association studies (GWAS) that used single-locus models by testing each single nucleotide polymorphism (SNP) for association with a phenotype, proved to be instrumental in identifying thousands of genetic variants associated with human traits and disorders [1,2,3,4]. Epistasis detection faces computational and statistical challenges in analyzing high-dimensional data and in testing millions of interaction models from an exhaustive search in GWAS [6, 12]. If the genotypic combinations that confer risk are nonadditive, finding the combinations of genotypes that increase risk can become a complex combinatorial challenge [7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.