A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection.

Ryan J Urbanowicz,Ambrose Ls Granizo-Mackenzie,Jeff Kiralis,Jason H Moore

doi:10.1186/1756-0381-7-8

Ryan J Urbanowicz, Ambrose Ls Granizo-Mackenzie + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/1756-0381-7-8

Copy DOI

Export

Save

Cite

Journal: BioData Mining	Publication Date: Jun 9, 2014
Citations: 9	License type: cc-by

Affiliation: Dartmouth College

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundThe statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model ‘architecture’ on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models.ResultsIn this study we utilized a geometric approach to classify pure, strict, two-locus epistatic models by “shape”. In total, 33 unique shape symmetry classes were identified. Using a detection difficulty metric, we found that model shape was consistently a significant predictor of model detection difficulty. Additionally, after categorizing shape classes by the number of edges in their shape projections, we found that this edge number was also significantly predictive of detection difficulty. Analysis of constraints within GAMETES indicated that increasing model population size can expand model class coverage but does little to change the range of observed difficulty metric scores. A variable population prevalence significantly increased the range of observed difficulty metric scores and, for certain constraints, also improved model class coverage.ConclusionsThese analyses further our theoretical understanding of epistatic relationships and uncover guidelines for the effective generation of complex models using GAMETES. Specifically, (1) we have characterized 33 shape classes by edge number, detection difficulty, and observed frequency (2) our results support the claim that model architecture directly influences detection difficulty, and (3) we found that GAMETES will generate a maximally diverse set of models with a variable population prevalence and a larger model population size. However, a model population size as small as 1,000 is likely to be sufficient.

Highlights

The statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology
It is important to note that pure, strict 2-locus epistastic models are not limited to these 33 shape classes, but rather that these are the only shape classes we observed when generating over two million genetic models with GAMETES
This study pursued three goals: (1) Classify and characterize pure, strict 2-locus epistatic models, (2) explore the relationship between model architecture and detection difficulty in the generalized context of model shape, and (3) explore the maintenance of model architecture diversity in GAMETES-generated model populations to establish guidelines for effective complex model generation. Our focus on such a precise, challenging class of epistatic models lends itself to both simulation studies, in which a gold standard for algorithmic evaluation is desirable, and to real world model detection where our characterization of a more mathematically tractable class of epistasis may facilitate the characterization of interaction in an observed biological model

Summary

Introduction

The statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. The phenomenon of epistasis, or gene-gene interaction, confounds the statistical search for main effects, i.e. single locus associations with phenotype [1]. Limited by time and technology, and drawn by the appeal of “low hanging fruit”, it has been typical for genetic studies to focus on single locus associations (i.e. main effects). For those common diseases typically regarded as complex (i.e. involving more than a single loci in the determination of phenotype) this approach has yielded limited success [10,11]. These theoretical works seek to lay the foundation for the identification and interpretation of multilocus associations as they may appear in genetic studies

Methods

Results

Conclusion