Abstract Somatic small mutations, SNVs or Indels, and copy number alterations are the two categories of mutations with the largest impact on cancer tumors. The Broad Institute has released somatic variant calling workflows for small mutations (M2) and copy number alterations (ModelSegments) based on the Genome Analysis Toolkit (GATK). The suite of workflows can call variants in capture or whole-genome sequencing data and will include functional annotations (Funcotator), such as protein change (for small variants) and impacted gene (for all variants). Common artifacts in sequencing data, such as those arising from oxidative DNA damage, FFPE/deamination, or mapping errors, are corrected automatically. Evaluation of the workflows is standardized and repeatable, which allows tracking of performance across versions, both detection performance (e.g. sensitivity, precision), as well as runtime performance (e.g. CPU and RAM usage). A matched normal is not required for a given tumor sample, since the workflows can leverage pre-processed panels of normals (PoNs). The workflows are freely available, are portable (i.e. can be run on local, on-prem, or cloud compute), are optimized for cost reduction, and can be tuned to optimally leverage available compute.The measured sensitivity of M2 was at least 0.93 for small somatic nucleotide variants (SNVs) and 0.83 for small insertions/deletions (Indels) on DREAM1, DREAM2, and DREAM3 challenges, and on a titrated mixture of germline samples (>=100x depth, AF = 0.2). The measured precision of M2 ranged from 0.91 to 0.98 on DREAM1, DREAM2, and DREAM3 for both SNVs and Indels. The false positive rate (FPR) of M2 was between 0.03 and 0.21 FP/Mb for SNVs, and between 0.0 and 0.1 FP/Mb for indels, on twelve paired, replicate normal-normal samples. The cost of the M2 workflow is about USD$1.15 for a pair of 35x WGS matched tumor-normal samples, using Google Cloud Compute, and required about 32 hours of CPU time on a single core with 3GB RAM. The measured sensitivity of ModelSegments was at least 0.91 for deletions and amplifications across three cohorts of TCGA whole-exome samples (Stomach adenocarcinoma N=39, Thyroid carcinoma N=50, and Lung adenocarcinoma N=60). The measured specificity for the same set of cohorts was at least 0.96 for both deletions and amplifications. All results reported here were using the corresponding SNP Array results as a truth set. GATK MS cost was approximately USD$0.65 on a 30x WGS pair using Google Cloud Compute and required about 6 hours of CPU time with a single core. The RAM usage was varied automatically in the workflow to minimize cost, but was in the range of 2-13GB. Citation Format: Lee Lichtenstein, Jonn Smith, David Benjamin, Aaron Chevalier, Kristian Cibulskis, Samuel K. Lee, Eric Banks. Somatic small variant and copy number alteration calling with the Genome Analysis Toolkit [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 5108.
Read full abstract