Abstract

Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1039-4) contains supplementary material, which is available to authorized users.

Highlights

  • The term “genetic variation” is often used to imply allelic combinatorics within a diploid organism such as humans or Drosophila

  • In order to group single-stranded families from the same fragment together, we normalize the order of the concatenation to produce a “canonical barcode”, which will be identical for both strands

  • The order of the canonical barcode is determined by a simple string comparison

Read more

Summary

Background

The term “genetic variation” is often used to imply allelic combinatorics within a diploid organism such as humans or Drosophila. Because high-throughput sequencing technologies exhibit considerable amounts of noise [3], it becomes increasingly difficult to reliably call variants with frequencies below 1 % [4,5,6,7,8,9] In these situations increased sequencing depth does not improve the predictive power but instead introduces additional noise. Today the vast majority of strategies for the identification of low-frequency sequence variants rely on next-generation sequencing technologies. Noise reduction in these approaches ranges from simple basequality filtering to complex statistical strategies incorporating instrument and mapping errors [4, 7, 14]. We demonstrate the application of this approach by validating rare variants in the human mitochondrial genome

Results and discussion
Conclusions
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call