Abstract

DNA fragmentation is a fundamental step during library preparation in hybridization capture-based, short-read sequencing. Ultra-sonication has been used thus far to prepare DNA of an appropriate size, but this method is associated with a considerable loss of DNA sample. More recently, studies have employed library preparation methods that rely on enzymatic fragmentation with DNA endonucleases to minimize DNA loss, particularly in nano-quantity samples. Yet, despite their wide use, the effect of enzymatic fragmentation on the resultant sequences has not been carefully assessed. Here, we used pairwise comparisons of somatic variants of the same tumor DNA samples prepared using ultrasonic and enzymatic fragmentation methods. Our analysis revealed a substantially larger number of recurrent artifactual SNVs/indels in endonuclease-treated libraries as compared with those created through ultrasonication. These artifacts were marked by palindromic structure in the genomic context, positional bias in sequenced reads, and multi-nucleotide substitutions. Taking advantage of these distinctive features, we developed a filtering algorithm to distinguish genuine somatic mutations from artifactual noise with high specificity and sensitivity. Noise cancelling recovered the composition of the mutational signatures in the tumor samples. Thus, we provide an informatics algorithm as a solution to the sequencing errors produced as a consequence of endonuclease-mediated fragmentation, highlighted for the first time in this study.

Highlights

  • Next-generation sequencing (NGS) technologies have facilitated the delivery of precision medical care to patients with cancer

  • There tends to be sufficient amount of matched normal DNA for the standard processing, and — not ideal—it is often the case that somatic mutation calling occurs for tumor and matched normal samples prepared using different DNA fragmentation methods

  • We noted that these tumor samples were prepared using the HyperPlus kit and that the paired normal DNA samples were prepared with the SureSelect kit

Read more

Summary

Introduction

Next-generation sequencing (NGS) technologies have facilitated the delivery of precision medical care to patients with cancer. Short-read sequencing technology has been widely exploited for this purpose, encompassing amplicon- or hybridization capture-based library preparations [1]. This diagnostic strategy relies on accurate sequencing and interpretation to provide patients with the right clinical decision [1, 2].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call