Abstract

Epistasis is an important factor affecting complex disease, some powerful methods should be proposed in order to improve large-scale genome-wide data mining efficiency. There are some problems exist in epistasis detection methods, such as low efficiency, low accuracy and inability to process large number of SNPs. In this work, we propose a k-tree optimizing Bayesian network (BN) epistasis mining method that can deal with large-scale of SNPs. Firstly, we construct the k-tree including large-scale of SNP loci and phenotype traits by sampling the Dandelion code uniformly. Then the k-tree is decomposed into different k-cliques by using the degree selection tree decomposition algorithm. In different k-cliques, the optimized fast incremental association BN learning method (omb-Fast) is used to learn the sub-Bayesian network quickly and accurately. Finally, all the sub-networks are merged to obtain the whole network. The above operations (k-tree generation, k-tree decomposition, network generation) are done several times, and thus to obtain the epistatic loci affecting phenotype traits. The simulated experiments validate the effective of our method. Experiment results show that the proposed method has better epistasis detection accuracy, lower false positive rate and higher F1-score on the basis of ensuring the efficiency compared to other methods. Most importantly, it can be used into large-scale of SNPs in the whole-genome for epistasis detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call