Abstract

Proposing a more effective and accurate epistatic loci detection method is of great significance in improving crop quality, disease treatment, etc. Due to the characteristics of high accuracy and processing non-linear relationship, Bayesian network (BN) has been widely used in constructing the network of SNPs and phenotypes and thus to mine epistasis. However, the shortcoming of BN is that the search space is too large and unable to process large-scale SNPs. In this work, we propose a kind of epistasis mining method using Markov Chain Monte Carlo (MCMC) sampling optimizing Bayesian network (EpiMCBN). Firstly, we use the space of node order composed of SNPs and phenotype to replace the space of network structure. Then MCMC algorithm is used to do sampling to generate multiple different initial orders in linear space or partial space. We use Markov state transition matrix to transfer the initial samples along the Markov chain, thus obtaining multiple order samples. Then we use the $\alpha$-BICBN scoring function to score the Bayesian networks corresponding to these node orders. Through estimating the probability of edge occurrence in the Bayesian networks, we get an approximate Bayesian network of SNPs and phenotype, then obtain the epistatic loci affecting phenotype. Finally, we compare EpiMCBN with the current popular epistasis mining algorithms using both simulated and real age-related macular disease (AMD) datasets. Experiment results show that EpiMCBN has better epistasis detection accuracy, lower false positive rate, and higher F1-score compared to other methods. Availability and implementation: Source code and dataset are available at: http://122.205.95.139/EpiMCBN/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call