Abstract

BackgroundHaplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches. Experimental haplotype reconstruction based on assembling fragments of individual chromosomes is promising, but with variable yields due to incompletely understood parameter choices.ResultsWe parameterize the clone-based haplotyping problem in order to provide theoretical and empirical assessments of the impact of different parameters on haplotype assembly. We confirm the intuition that long clones help link together heterozygous variants and thus improve haplotype length. Furthermore, given the length of the clones, we address how to choose the other parameters, including number of pools, clone coverage and sequencing coverage, so as to maximize haplotype length. We model the problem theoretically and show empirically the benefits of using larger clones with moderate number of pools and sequencing coverage. In particular, using 140 kb BAC clones, we construct haplotypes for a personal genome and assemble haplotypes with N50 values greater than 2.6 Mb. These assembled haplotypes are longer and at least as accurate as haplotypes of existing clone-based strategies, whether in vivo or in vitro.ConclusionsOur results provide practical guidelines for the development and design of clone-based methods to achieve long range, high-resolution and accurate haplotypes.

Highlights

  • Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches

  • The long fragment read (LFR) method was used in one study [13] to generate haploid fragments of length (L) 10 to 300 kbp, which were combined into 384 pools with around 5,000 to 10,000 fragments per pool

  • Assuming that clones of fixed length L arrive at random and overlapping clones come from different pools, the overlapping clones assemble into longer contigs

Read more

Summary

Introduction

Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches. As long as the clones within a pool do not overlap, the clones can be computationally reconstructed from shorter sequencing reads and assembled into longer haploid sequences. In a study by Suk et al, fosmid clones with an average length of 40 kbp were combined into 288 pools, with 5,000 clones per pool [12], and the N50 length of the assembled haplotypes was 1 Mbp. Several conceptually similar haplotyping methods have recently been reported, which fragment genomic DNA in vitro and pool together the fragments for sequencing. The long fragment read (LFR) method was used in one study [13] to generate haploid fragments of length (L) 10 to 300 kbp, which were combined into 384 pools with around 5,000 to 10,000 fragments per pool.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.