Abstract
Decreasing the cost of high-throughput DNA sequencing technologies, provides a huge amount of data that enables researchers to determine haplotypes for diploid and polyploid organisms. Although various methods have been developed to reconstruct haplotypes in diploid form, their accuracy is still a challenging task. Also, most of the current methods cannot be applied to polyploid form. In this paper, an iterative method is proposed, which employs hypergraph to reconstruct haplotype. The proposed method by utilizing chaotic viewpoint can enhance the obtained haplotypes. For this purpose, a haplotype set was randomly generated as an initial estimate, and its consistency with the input fragments was described by constructing a weighted hypergraph. Partitioning the hypergraph specifies those positions in the haplotype set that need to be corrected. This procedure is repeated until no further improvement could be achieved. Each element of the finalized haplotype set is mapped to a line by chaos game representation, and a coordinate series is defined based on the position of mapped points. Then, some positions with low qualities can be assessed by applying a local projection. Experimental results on both simulated and real datasets demonstrate that this method outperforms most other approaches, and is promising to perform the haplotype assembly.
Highlights
Improving the high-throughput DNA sequencing technologies dramatically decreased the costs of genome sequencing methods
There are reports indicating that different populations may have various responses to drugs [8,9,10]. These findings demonstrate that haplotypes in human genomics data could be a useful and informative tool in mapping genes that are involves in representative diseases, as well as personalized medicine [11]
Reconstruction rate (RR) [4] as a conventional metric was used to evaluate the quality of the obtained haplotypes
Summary
Improving the high-throughput DNA sequencing technologies dramatically decreased the costs of genome sequencing methods. These methods start with a set of arbitrary sequences as initial haplotypes, and improve it step by step concerning the input fragments They make a similar weighted graph in their distinctive model. Several MEC-based approaches have been developed to solve this problem In this regard, the input fragments are organized in P clusters, and the haplotypes are considered as the centers of constructed clusters. The current work aims to address these challenges by a better description of the similarity measurement between the input fragments This was done by a heuristic method with a favorable runtime based on the hypergraph model. We used the emission probability P(Xj |hj, Rj) that has been defined in [41], which is used to identify errors in the reconstructed haplotype This measure is calculated for each position j as follows: Y. 8 >< 0 if hiðjÞ 1⁄4 1⁄40À 0 and cbsiðjÞ 0:5; hbiðjÞ 1⁄4 >: 1 if hiðjÞ 1⁄4 1⁄40À 0 and cbsiðjÞ > 0:5; ð11Þ hiðjÞ
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have