Abstract

BackgroundPooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively.ResultsHaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.’s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate.ConclusionHaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.’s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.

Highlights

  • Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing

  • We evaluate HaploJuice against five different assembly algorithms, Hmmfreq [8], ShoRAH [3], SAVAGE [4], PredictHaplo [5] and QuRe [6], using simulated and real data sets in which three sequences are mixed in known frequencies

  • The three sub-samples were mixed under various proportions: 5:4:1, 5:3:2, 6:3:1, and 7:2:1 (100 data sets each). 150-long pairended reads with total coverage 1500x were simulated by ART [11] with the default Illumina error model from three 10k-long haplotypes, which were generated by INDELible [12] using Jukes and Cantor model (JC) [13] model from a 3-tipped tree with 0.05 root-to-tip distance randomly created by Evolver [14] from PAML [15] package

Read more

Summary

Introduction

Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. Given the huge volume of throughput, it is often cost-effective to mix multiple sub-samples in a single sample for sequencing, a process called pooling. A short unique identifiable sequence tag (i.e. barcode) is often appended to each DNA molecule of the same sub-sample before pooling and sequencing. If it is possible to efficiently reconstruct haplotypes from mixtures of samples without using barcodes, this may reduce sequencing costs significantly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call