Abstract

BackgroundIn Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may also interfere with the detection of genuine epistasis signals in that there may be complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. GPD may involve unlinked genetic markers, even residing on different chromosomes. Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning algorithms, to obtain unconfounded epistasis results. However, little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Here, we focus on Model-Based Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs.ResultsUsing real-life human LD patterns from a homogeneous subpopulation of British ancestry, we investigated the impact of LD-pruning on the statistical sensitivity of MB-MDR. We considered three different non-fully penetrant epistasis models with varying effect sizes. There is a clear advantage in pre-analysis pruning using sliding windows at r2 of 0.75 or lower, but using a threshold of 0.20 has a detrimental effect on the power to detect a functional interactive SNP pair (power < 25%). Signal sensitivity, directly using LD-block information to determine whether an epistasis signal is present or not, benefits from LD-pruning as well (average power across scenarios: 87%), but is largely hampered by functional loci residing at the boundaries of an LD-block.ConclusionsOur results confirm that LD patterns and the position of causal variants in LD blocks do have an impact on epistasis detection, and that pruning strategies and LD-blocks definitions combined need careful attention, if we wish to maximize the power of large-scale epistasis screenings.

Highlights

  • A single-nucleotide polymorphism (SNP) is a variation in a single nucleotide that occurs at a specific position in the genome, where each variation is commonly present within a population (e.g. > 1 %)

  • Type I error estimates were below 1% for all Linkage Disequilibrium (LD) block scenarios and every LD pruning thresholding

  • The genetic model implemented in our study along with the odds ratios effect sizes for disease risk and the genotype penetrance table were inspired from the suspected pure biological epistatic effect between ERAP1 and HLA-B*27 in Ankylosing Spondylitis affecting the general population with British ancestry with a prevalence of ∼ 1% [33, 36]

Read more

Summary

Introduction

A single-nucleotide polymorphism (SNP) is a variation in a single nucleotide that occurs at a specific position in the genome, where each variation is commonly present within a population (e.g. > 1 %). The success of GWAS using SNPs as genetic markers in part relies on Linkage Disequilibrium (LD) as a population concept. It can be considered to be linkage between markers on a population scale [1] It is different from Gametic Phase Disequilibrium (GPD) that describes the non-random association of alleles within gametes (even for physically unlinked loci on different chromosomes). In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. Little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call