A fast and accurate algorithm for single individual haplotyping

Minzhu Xie,Jianxin Wang,Tao Jiang

doi:10.1186/1752-0509-6-s2-s8

Abstract

BackgroundDue to the difficulty in separating two (paternal and maternal) copies of a chromosome, most published human genome sequences only provide genotype information, i.e., the mixed information of the underlying two haplotypes. However, phased haplotype information is needed to completely understand complex genetic polymorphisms and to increase the power of genome-wide association studies for complex diseases. With the rapid development of DNA sequencing technologies, reconstructing a pair of haplotypes from an individual's aligned DNA fragments by computer algorithms (i.e., Single Individual Haplotyping) has become a practical haplotyping approach.ResultsIn the paper, we combine two measures "errors corrected" and "fragments cut" and propose a new optimization model, called Balanced Optimal Partition (BOP), for single individual haplotyping. The model generalizes two existing models, Minimum Error Correction (MEC) and Maximum Fragments Cut (MFC), and could be made either model by using some extreme parameter values. To solve the model, we design a heuristic dynamic programming algorithm H-BOP. By limiting the number of intermediate solutions at each iteration to an appropriately chosen small integer k, H-BOP is able to solve the model efficiently.ConclusionsExtensive experimental results on simulated and real data show that when k = 8, H-BOP is generally faster and more accurate than a recent state-of-art algorithm ReFHap in haplotype reconstruction. The running time of H-BOP is linearly dependent on some of the key parameters controlling the input size and H-BOP scales well to large input data. The code of H-BOP is available to the public for free upon request to the corresponding author.

Highlights

Due to the difficulty in separating two copies of a chromosome, most published human genome sequences only provide genotype information, i.e., the mixed information of the underlying two haplotypes
Since we only consider heterozygous single nucleotide polymorphisms (SNPs), for each data set, a haplotype h1 containing n SNPs is generated randomly first and the other haplotype h2 is obtained by flipping each allele of h1
Results on real data We downloaded a real data set from the Single Individual Haplotyping (SIH) website [27], which contains the aligned sorted fosmid-based NGS DNA sequence fragments and gold-standard haplotypes of a HapMap trio child, NA12878 [12]

Summary

Introduction

Due to the difficulty in separating two (paternal and maternal) copies of a chromosome, most published human genome sequences only provide genotype information, i.e., the mixed information of the underlying two haplotypes. Identification of the combination of alleles at the SNP loci on the same chromosome copy, i.e., haplotyping, is needed to fully understand the human genetic variation patterns and enhance the power of genome-wide association studies for complex diseases [2,3]. It is expensive and labor-intensive to separate two copies of chromosomes by biological techniques [4], and most published human individuals’ genomes contain. SIH assembles a pair of haplotypes from an individual’s aligned DNA sequence fragments. When there are enough DNA sequence fragments that cover two or more consecutive variant loci, SIH builds longer and more accurate haplotype blocks than haplotype inference does [12]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Systems Biology	Publication Date: Jan 1, 2012
Citations: 51	License type: cc-by

R Discovery Prime

R Discovery Prime

A fast and accurate algorithm for single individual haplotyping

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology

Lead the way for us

Similar Papers

AROHap: An effective algorithm for single individual haplotype reconstruction based on asexual reproduction optimization
Mohammad-H Olyaee ... Alireza Khanteymoori
Computational Biology and Chemistry | VOL. 72
Mohammad-H Olyaee, et. al.Mohammad-H Olyaee ... Alireza Khanteymoori
14 Dec 2017
Computational Biology and Chemistry | VOL. 72

A Practical Exact Algorithm for the Individual Haplotyping Problem MEC/GI
Jianxin Wang ... Jianer Chen
Algorithmica | VOL. 56
Jianxin Wang, et. al.Jianxin Wang ... Jianer Chen
14 Feb 2009
Algorithmica | VOL. 56

HAHap: a read-based haplotyping method using hierarchical assembly.
Yu-Yu Lin ... Yen-Jen Oyang
PeerJ | VOL. 6
Yu-Yu Lin, et. al.Yu-Yu Lin ... Yen-Jen Oyang
30 Oct 2018
PeerJ | VOL. 6

Analysis of Complex Disease Association and Linkage Studies Using the University of California Santa Cruz Genome Browser
Tianyuan Wang ... Terrence S Furey
Circulation: Cardiovascular Genetics | VOL. 2
Tianyuan Wang, et. al.Tianyuan Wang ... Terrence S Furey
01 Apr 2009
Circulation: Cardiovascular Genetics | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fast and accurate algorithm for single individual haplotyping

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology