Abstract
BackgroundTargeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is therefore essential that errors that constitute baseline noise and impose a practical limit on detection are characterized. In the present study, we systematically evaluate the extent to which errors are incurred during specific steps of the capture-based targeted sequencing process.ResultsWe removed most sequencing artifacts by filtering out low-quality bases and then analyze the remaining background noise. By recognizing that plasma DNA is naturally fragmented to be of a size comparable to that of mono-nucleosomal DNA, we were able to identify and characterize errors that are specifically associated with acoustic shearing. Two-thirds of C:G > A:T errors and one quarter of C:G > G:C errors were attributed to the oxidation of guanine during acoustic shearing, and this was further validated by comparative experiments conducted under different shearing conditions. The acoustic shearing step also causes A > G and A > T substitutions localized to the end bases of sheared DNA fragments, indicating a probable association of these errors with DNA breakage. Finally, the hybrid selection step contributes to one-third of the remaining C:G > A:T and one-fifth of the C > T errors.ConclusionsThe results of this study provide a comprehensive summary of various errors incurred during targeted deep sequencing, and their underlying causes. This information will be invaluable to drive technical improvements in this sequencing method, and may increase the future usage of targeted deep sequencing methods for low-allelic fraction variant detection.
Highlights
Targeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is essential that errors that constitute baseline noise and impose a practical limit on detection are characterized
To exclude the possibility of systemic bias affecting either the library or sequencing data of the different sample types, the allele frequencies of single nucleotide polymorphisms (SNPs) between matched plasma and peripheral blood leukocyte (PBL) samples were compared. The results of this analysis showed a strong correlation between Single nucleotide polymorphism (SNP) allele frequencies in plasma and PBL samples (R = 0.9913, p value < 0.0001; Additional file 3: Figure S1)
Errors introduced by the sequencing reaction After excluding tumor-derived single nucleotide variants (SNVs) and germline SNPs (“Methods”), we investigated the extent to which background error was introduced during the sequencing run by graphing the Phred base quality scores of non-reference background alleles
Summary
Targeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is essential that errors that constitute baseline noise and impose a practical limit on detection are characterized. Tens of thousands of tumors of varying types have been analyzed using next-generation sequencing (NGS) for systematic variant discovery [1, 2] This has resulted in the comprehensive characterization of many cancer genomes and we are able to identify genetic alterations that are common to a variety of human tumor types [1, 3]. Errors caused by Illumina HiSeq sequencer chemistry are relatively well understood, and appropriate data filtering criteria based on this knowledge are routinely applied to generated data to remove them [23]. The depth of coverage of sample DNA after removal of duplication in FNA samples was on average
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.