Abstract

BackgroundA central question for disease studies and crop improvements is how genetics variants drive phenotypes. Genome Wide Association Study (GWAS) provides a powerful tool for characterizing the genotype-phenotype relationships in complex traits and diseases. Epistasis (gene-gene interaction), including high-order interaction among more than two genes, often plays important roles in complex traits and diseases, but current GWAS analysis usually just focuses on additive effects of single nucleotide polymorphisms (SNPs). The lack of effective computational modelling of high-order functional interactions often leads to significant under-utilization of GWAS data.ResultsWe have developed a novel Bayesian computational method with a Markov Chain Monte Carlo (MCMC) search, and implemented the method as a Bayesian High-order Interaction Toolkit (BHIT) for detecting epistatic interactions among SNPs. BHIT first builds a Bayesian model on both continuous data and discrete data, which is capable of detecting high-order interactions in SNPs related to case—control or quantitative phenotypes. We also developed a pipeline that enables users to apply BHIT on different species in different use cases.ConclusionsUsing both simulation data and soybean nutritional seed composition studies on oil content and protein content, BHIT effectively detected some high-order interactions associated with phenotypes, and it outperformed a number of other available tools. BHIT is freely available for academic users at http://digbio.missouri.edu/BHIT/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2217-6) contains supplementary material, which is available to authorized users.

Highlights

  • A central question for disease studies and crop improvements is how genetics variants drive phenotypes

  • Bayesian High-order Interaction Toolkit (BHIT) was set to running 1,000,000 times of Markov Chain Monte Carlo (MCMC), and set 990,000 as the burn-in period to guarantee the convergency, 0.5 was chosen as the threshold for the posterior probabilities to determine the dependency for each loci and phenotype

  • Epistasis may cause hidden quantitative genetic variation in natural populations and could be responsible for the small additive effects, missing heritability and the lack of replication, which are typically observed for human complex traits [9, 40]

Read more

Summary

Introduction

A central question for disease studies and crop improvements is how genetics variants drive phenotypes. The lack of effective computational modelling of high-order functional interactions often leads to significant under-utilization of GWAS data. In this era of explosive genomics development and next-generation sequencing (NGS) data, genome-wide association study (GWAS) is central to characterizing complex traits and diseases [1]. The major challenge in SNP interaction detection using the whole genome-scale data is computing time [4, 5, 9]. Researchers have developed several methods to address this issue in detecting and exploring SNP interactions [5] These methods use four strategies: exhaustive search, heuristic search, sampling, and two-stage search. The two-stage search strategy separates the two search processes by first filtering out candidates and identifying interactions, such as SNPHarvester [15] and TRM [16]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call