Abstract

Reconstructing haplotypes from sequencing data is one of the major challenges in genetics. Haplotypes play a crucial role in many analyses, including genome-wide association studies and population genetics. Haplotype reconstruction becomes more difficult for higher numbers of homologous chromosomes, as it is often the case for polyploid plants. This complexity is compounded further by higher heterozygosity, which denotes the frequent presence of variants between haplotypes. We have designed Ranbow, a new tool for haplotype reconstruction of polyploid genome from short read sequencing data. Ranbow integrates all types of small variants in bi- and multi-allelic sites to reconstruct haplotypes. To evaluate Ranbow and currently available competing methods on real data, we have created and released a real gold standard dataset from sweet potato sequencing data. Our evaluations on real and simulated data clearly show Ranbow’s superior performance in terms of accuracy, haplotype length, memory usage, and running time. Specifically, Ranbow is one order of magnitude faster than the next best method. The efficiency and accuracy of Ranbow makes whole genome haplotype reconstruction of complex genome with higher ploidy feasible.

Highlights

  • The rapid advances in sequencing technologies and assembly tools have enabled the assembly of reference genomes from multiple organims [1,2,3]

  • We identified the Ambiguity of Merging (AoM) fragments problem as one of the major technical challenges specific to polyploid haplotype reconstruction

  • This paper focuses on sites that vary by single-base substitution, multi-base substitutions, and small (

Read more

Summary

Introduction

The rapid advances in sequencing technologies and assembly tools have enabled the assembly of reference genomes from multiple organims [1,2,3]. Though useful, such reference sequences do not reflect the complex wealth of information in each chromosome entity. It is clear that these reference sequences are a consensus of homologous chromosomes and are only estimates of them. Knowing the sequence of single chromosomes provides us with a better view of the genome. The sequence of variants on a single copy of a chromosome is called a haplotype [5, 6]. Haplotyping plays an important role in a multitude of biological analysis, such as genome-wide association studies and imputation [7,8,9], population genetics studies [10, 11], genome regulation [4, 12, 13], and genotype error detection [14, 15]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call