Abstract

BackgroundAnalysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls.ResultsWe show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced.ConclusionsRead bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users.

Highlights

  • Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data

  • We present results from variant calling on simulated amplicon sequencing reads where each read has a variant near the edges of the amplicon, with varying number of primer bases remaining after primer digestion

  • Our results clearly show the presence of blind spots near the read ends in amplicon sequencing data

Read more

Summary

Introduction

Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Multiple enrichment strategies are currently in use for target enrichment [1]. These target enrichment strategies can be broadly classified into PCR-based methods and hybridization-based methods. Hybridization-based enrichment is by far the most widely used approach for large target regions such as targeted exome sequencing [2], in which all protein-coding regions and untranslated regions flanking them are targeted. Both hybridization-based and PCR-based enrichment strategies are often used for smaller target regions

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.