Abstract

Haplotype phasing is a well studied problem in the context of genotype data. With the recent developments in high-throughput sequencing, new algorithms are needed for haplotype phasing, when the number of samples sequenced is low and when the sequencing coverage is blow. High-throughput sequencing technologies enables new possibilities for the inference of haplotypes. Since each read is originated from a single chromosome, all the variant sites it covers must derive from the same haplotype. Moreover, the sequencing process yields much higher SNP density than previous methods, resulting in a higher correlation between neighboring SNPs. We offer a new approach for haplotype phasing, which leverages on these two properties. Our suggested algorithm, called Perfect Phlogeny Haplotypes from Sequencing (PPHS) uses a perfect phylogeny model and it models the sequencing errors explicitly. We evaluated our method on real and simulated data, and we demonstrate that the algorithm outperforms previous methods when the sequencing error rate is high or when coverage is low.

Highlights

  • The etiology of complex diseases is composed of both environmental and genetic factors

  • These studies have been focusing on measurements of single nucleotide polymorphisms (SNPs), which are positions in the genome in which at some point in history there has been a mutation that was fixed in the population

  • Our algorithm aims at finding a perfect phylogeny tree on the set of SNPs in a given window, and a corresponding haplotype assignment for each individual

Read more

Summary

Introduction

The etiology of complex diseases is composed of both environmental and genetic factors. Much of this effort has been focused on genome-wide association studies (GWAS), in which the DNA of a population of cases (individuals carrying the studied condition), and a population of controls (general population) is being measured and compared. These studies have been focusing on measurements of single nucleotide polymorphisms (SNPs), which are positions in the genome in which at some point in history there has been a mutation that was fixed in the population.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.