Abstract
SUMMARY amplimap is a command-line tool to automate the processing and analysis of data from targeted next-generation sequencing experiments with PCR-based amplicons or capture-based enrichment systems. From raw sequencing reads, amplimap generates output such as read alignments, annotated variant calls, target coverage statistics and variant allele counts and frequencies for each target base pair. In addition to its focus on user-friendliness and reproducibility, amplimap supports advanced features such as consensus base calling for read families based on unique molecular identifiers and filtering false positive variant calls caused by amplification of off-target loci. AVAILABILITY AND IMPLEMENTATION amplimap is available as a free Python package under the open-source Apache 2.0 License. Documentation, source code and installation instructions are available at https://github.com/koelling/amplimap.
Highlights
Targeted next-generation sequencing (NGS), for example from PCR-generated amplicons or capture-based methods, is widely used for screening of candidate disease genes in patient cohorts (Fenwick et al, 2016) or for quantification of variant allele frequencies (VAFs) to detect allele-specific expression or mosaic mutations (Bernkopf et al, 2017; Reijnders et al, 2018).Recently, targeted NGS techniques have been extended to redundantly sequence the same original molecule of DNA multiple times to achieve very low error rates (Salk et al, 2018)
This enables the detection of somatic, sub-clonal mutations from cancer samples or mosaicism down to low levels (Acuna-Hidalgo et al, 2017; Maher et al, 2018). These high-fidelity protocols typically rely on the inclusion of unique molecular identifier (UMI) sequences, for example with single-molecule molecular inversion probes
Significant computational work needs to be carried out to translate the raw sequencing reads generated by these protocols into interpretable genomic data, such as variant calls or VAFs
Summary
Targeted next-generation sequencing (NGS), for example from PCR-generated amplicons or capture-based methods, is widely used for screening of candidate disease genes in patient cohorts (Fenwick et al, 2016) or for quantification of variant allele frequencies (VAFs) to detect allele-specific expression or mosaic mutations (Bernkopf et al, 2017; Reijnders et al, 2018). Targeted NGS techniques have been extended to redundantly sequence the same original molecule of DNA multiple times to achieve very low error rates (Salk et al, 2018) This enables the detection of somatic, sub-clonal mutations from cancer samples or mosaicism down to low levels (Acuna-Hidalgo et al, 2017; Maher et al, 2018). A common challenge is the unintended amplification of highly homologous loci, such as pseudogenes (Claes and De Leeneer, 2014) These loci may be amplified when primers inadvertently hybridize to highly homologous regions, creating chimeric reads that may lead to false variant calls (Fig. 1a). Such false positives are often only identified through manual comparison to pseudogene sequences. Additional target coverage tables give an overview of how thoroughly each target was sequenced in each sample
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.