Abstract

Single nucleotide polymorphism (SNP) is the most frequent form of human genetic variations and of importance for medical diagnosis and tracking disease genes. A haplotype is a sequence of SNPs from a single copy of a chromosome, and haplotype assembly from SNP fragments is based on DNA fragments with SNPs and the methodology of shotgun sequence assembly. In contrast to conventional combinatorial models which aim at different error types in SNP fragments, in this paper we propose a new statistical model - a Markov chain model for haplotype assembly based on information of SNP fragments. The main advantage of this model over combinatorial ones is that it requires no prior information on error types in data. In addition, unlike exact algorithms with the exponential-time computation complexity for most combinatorial models, the proposed model can be solved in polynomial time and thus is efficient for large-scale problems. Experiment results on several data sets illustrate the effectiveness of the new method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call