Abstract

Summaryamplimap is a command-line tool to automate the processing and analysis of data from targeted next-generation sequencing experiments with PCR-based amplicons or capture-based enrichment systems. From raw sequencing reads, amplimap generates output such as read alignments, annotated variant calls, target coverage statistics and variant allele counts and frequencies for each target base pair. In addition to its focus on user-friendliness and reproducibility, amplimap supports advanced features such as consensus base calling for read families based on unique molecular identifiers and filtering false positive variant calls caused by amplification of off-target loci.Availability and implementationamplimap is available as a free Python package under the open-source Apache 2.0 License. Documentation, source code and installation instructions are available at https://github.com/koelling/amplimap.

Highlights

  • Targeted next-generation sequencing (NGS), for example from PCR-generated amplicons or capture-based methods, is widely used for screening of candidate disease genes in patient cohorts (Fenwick et al, 2016) or for quantification of variant allele frequencies (VAFs) to detect allele-specific expression or mosaic mutations (Bernkopf et al, 2017; Reijnders et al, 2018).Recently, targeted NGS techniques have been extended to redundantly sequence the same original molecule of DNA multiple times to achieve very low error rates (Salk et al, 2018)

  • This enables the detection of somatic, sub-clonal mutations from cancer samples or mosaicism down to low levels (Acuna-Hidalgo et al, 2017; Maher et al, 2018). These high-fidelity protocols typically rely on the inclusion of unique molecular identifier (UMI) sequences, for example with single-molecule molecular inversion probes

  • Significant computational work needs to be carried out to translate the raw sequencing reads generated by these protocols into interpretable genomic data, such as variant calls or VAFs

Read more

Summary

Introduction

Targeted next-generation sequencing (NGS), for example from PCR-generated amplicons or capture-based methods, is widely used for screening of candidate disease genes in patient cohorts (Fenwick et al, 2016) or for quantification of variant allele frequencies (VAFs) to detect allele-specific expression or mosaic mutations (Bernkopf et al, 2017; Reijnders et al, 2018). Targeted NGS techniques have been extended to redundantly sequence the same original molecule of DNA multiple times to achieve very low error rates (Salk et al, 2018) This enables the detection of somatic, sub-clonal mutations from cancer samples or mosaicism down to low levels (Acuna-Hidalgo et al, 2017; Maher et al, 2018). A common challenge is the unintended amplification of highly homologous loci, such as pseudogenes (Claes and De Leeneer, 2014) These loci may be amplified when primers inadvertently hybridize to highly homologous regions, creating chimeric reads that may lead to false variant calls (Fig. 1a). Such false positives are often only identified through manual comparison to pseudogene sequences. Additional target coverage tables give an overview of how thoroughly each target was sequenced in each sample

Primer trimming and detection of off-target events
Read family consensus calls
Tutorials
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call