Abstract
This article presents the ability of an omnibus permutation test on ensembles of two-locus analyses (2LOmb) to detect pure epistasis in the presence of genetic heterogeneity. The performance of 2LOmb is evaluated in various simulation scenarios covering two independent causes of complex disease where each cause is governed by a purely epistatic interaction. Different scenarios are set up by varying the number of available single nucleotide polymorphisms (SNPs) in data, number of causative SNPs and ratio of case samples from two affected groups. The simulation results indicate that 2LOmb outperforms multifactor dimensionality reduction (MDR) and random forest (RF) techniques in terms of a low number of output SNPs and a high number of correctly-identified causative SNPs. Moreover, 2LOmb is capable of identifying the number of independent interactions in tractable computational time and can be used in genome-wide association studies. 2LOmb is subsequently applied to a type 1 diabetes mellitus (T1D) data set, which is collected from a UK population by the Wellcome Trust Case Control Consortium (WTCCC). After screening for SNPs that locate within or near genes and exhibit no marginal single-locus effects, the T1D data set is reduced to 95,991 SNPs from 12,146 genes. The 2LOmb search in the reduced T1D data set reveals that 12 SNPs, which can be divided into two independent sets, are associated with the disease. The first SNP set consists of three SNPs from MUC21 (mucin 21, cell surface associated), three SNPs from MUC22 (mucin 22), two SNPs from PSORS1C1 (psoriasis susceptibility 1 candidate 1) and one SNP from TCF19 (transcription factor 19). A four-locus interaction between these four genes is also detected. The second SNP set consists of three SNPs from ATAD1 (ATPase family, AAA domain containing 1). Overall, the findings indicate the detection of pure epistasis in the presence of genetic heterogeneity and provide an alternative explanation for the aetiology of T1D in the UK population.
Highlights
Epistasis or gene-gene interactions are among many causes of complex diseases (Moore 2005)
Testing with small-scaled simulated data 2LOmb is benchmarked against multifactor dimensionality reduction (MDR) and random forest (RF) in a simulation trial involving both pure epistasis and genetic heterogeneity
An output from an efficient algorithm should contain a low number of Single nucleotide polymorphism (SNP) and a high number of correctly-identified causative SNPs
Summary
Epistasis or gene-gene interactions are among many causes of complex diseases (Moore 2005). In the simplest form, epistasis can be described by two-locus disease models, in which both loci jointly contribute towards the disease susceptibility (Neuman and Rice 1992; Schork et al 1993). Many attempts have been made to provide consistent definitions of epistasis (Cordell 2002; Hallgrímsdóttir and Yuster 2008; Li and Reich 2000; Marchini et al 2005; Musani et al 2007; Verhoeven et al 2010). Regardless of preferred definitions, a common ground for describing epistasis covers an effect deviating from the combined individual effects of each genetic factor. Epistasis describes an effect that departs from a linear addition of individual effects (Fisher 1918). The detection of epistasis provides necessary information complementary to that gained through single-locus analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.