Abstract
Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1039-4) contains supplementary material, which is available to authorized users.
Highlights
The term “genetic variation” is often used to imply allelic combinatorics within a diploid organism such as humans or Drosophila
In order to group single-stranded families from the same fragment together, we normalize the order of the concatenation to produce a “canonical barcode”, which will be identical for both strands
The order of the canonical barcode is determined by a simple string comparison
Summary
The term “genetic variation” is often used to imply allelic combinatorics within a diploid organism such as humans or Drosophila. Because high-throughput sequencing technologies exhibit considerable amounts of noise [3], it becomes increasingly difficult to reliably call variants with frequencies below 1 % [4,5,6,7,8,9] In these situations increased sequencing depth does not improve the predictive power but instead introduces additional noise. Today the vast majority of strategies for the identification of low-frequency sequence variants rely on next-generation sequencing technologies. Noise reduction in these approaches ranges from simple basequality filtering to complex statistical strategies incorporating instrument and mapping errors [4, 7, 14]. We demonstrate the application of this approach by validating rare variants in the human mitochondrial genome
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have