Abstract

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly. However, current long-read assemblers are either reference based, so introduce biases, or fail to capture the haplotype diversity of diploid genomes. We present phasebook, a de novo approach for reconstructing the haplotypes of diploid genomes from long reads. phasebook outperforms other approaches in terms of haplotype coverage by large margins, in addition to achieving competitive performance in terms of assembly errors and assembly contiguity.

Highlights

  • There are generally multiple copies of hereditary material in most eukaryotic organisms, where each copy is inherited from one of the ancestors

  • Benchmarking results on both simulated and real data indicate that our method outperforms state-of-the-art tools in terms of various aspects

  • Benchmarking experiments Summary In the following evaluation, we focus on the criteria that relate to performance in terms of phasing, like haplotype coverage, k-mer recovery rate, NGA50 and Phased N50, and switch error rate, apart from a basic evaluation of haplotigs in terms of error content, first and foremost

Read more

Summary

Introduction

There are generally multiple copies of hereditary material in most eukaryotic organisms, where each copy is inherited from one of the ancestors. The copy-specific nucleic acid sequences are called haplotypes, and the haplotype-specific contigs that assembly programs compute for reconstructing them are often referred to as haplotigs. Most vertebrates (such as human and mouse) and many higher plants (such as maize and arabidopsis) are diploid, which means that there are two copies for each chromosome. Haplotype reconstruction plays a crucial role in various disciplines. Haplotype information is important in functional genomics since there is widespread allele-specific gene expression across the human genome [1]; haplotype information crucially supports studies on population demography, gene flow, and selection in conservation genomics [2]; the haplotype-specific combinations of genetic variants usually affect disease phenotypes and clinical responses, which is of great concern in precision medicine [3, 4].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call