Abstract 852: Improved tumor-only somatic variant calling using a gradient boosted machine learning algorithm

Nicholas Phillips,John West,Jason Harris,Richard Chen,Patrick Jongeneel

doi:10.1158/1538-7445.am2020-852

Abstract

Abstract Background: Accurate identification of somatic variants in a tumor sample is often enabled by utilizing a paired normal tissue sample from the same patient that enables the separation of private germline mutations from somatic variant calls. However, a paired normal sample is not always available from patients, making accurate somatic variant analysis more challenging. Composite proxy normals and other filtering approaches can be used in lieu of a paired normal sample, but the resulting somatic call set may suffer from incomplete germline filtering and reduced sensitivity compared to paired tumor-normal analysis. To address these limitations, we developed a novel, machine learning based tumor-only somatic small variant classifier, which leverages gradient boosted decision trees to substantially increase somatic variant specificity from the tumor-only analysis without reducing overall sensitivity. Methods: We produced a ground truth set of somatic SNVs and indels from 350 whole exome-sequenced tumor-normal pairs using a validated cancer bioinformatics pipeline. We then generated a feature set from each tumor sample by aggregating pileup attributes including: allelic frequency and read depth, tumor cellularity estimations, germline variant calls from HaplotypeCaller, somatic variant calls from Mutect and Mutect2 using a proxy-normal, copy-number alterations, annotations from databases such as GnomAD and COSMIC, and problematic-region annotations including homopolymers. Using these features and the ground truth set, we trained a gradient-boosted decision tree to predict the somatic likelihood of each variant. Model hyperparameters were optimized using a random search during stratified cross-validation, and model performance was evaluated on a hold-out test set. Results: Using a classification threshold that optimized F1 score on the validation set, we observed a significant increase in model precision on the test set, with comparable sensitivity to somatic calling using a conventional proxy-normal filtering approach. Because our model outputs somatic probability, the classification threshold can be tuned to favor sensitivity or specificity of the call set, depending on the desired use case. To improve interpretability of our model, we employed shapely additive explanations (SHAP) to obtain feature importance values. Our analysis revealed that annotations such as population frequency and base quality scores were among the most important features. Conclusions: Our machine learning approach can greatly enhance germline filtering when making somatic variant calls when a paired normal sample is not available without decreasing sensitivity for true somatic variants. Depending on the use-case, classification thresholds can be tuned to improve sensitivity over conventional variant callers for more modest improvements in precision. Finally, model interpretation has revealed a subset of highly discriminative features, which may prove useful for variant interpretation, future feature set expansion, or model tuning. Citation Format: Nicholas Phillips, Patrick Jongeneel, John West, Richard Chen, Jason Harris. Improved tumor-only somatic variant calling using a gradient boosted machine learning algorithm [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 852.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 852: Improved tumor-only somatic variant calling using a gradient boosted machine learning algorithm

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Similar Papers

Abstract 4926: Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data
Doron Shem-Tov ... Ilya Soifer
Cancer Research | VOL. 84
Doron Shem-Tov, et. al.Doron Shem-Tov ... Ilya Soifer
22 Mar 2024
Abstract 4926: Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data
Doron Shem-Tov ... Ilya Soifer

Abstract 533: Accurately identifying expressed somatic variants for neoantigen detection and immuno-oncology
Sean M Boyle ... Shujun Luo
Cancer Research | VOL. 76
Sean M Boyle, et. al.Sean M Boyle ... Shujun Luo
15 Jul 2016
Cancer Research | VOL. 76

Abstract 2474: Automated somatic variant classifier to reduce false positives identified by tumor normal variant callers
Alena S Harley ... Eve Shinbrot
Cancer Research | VOL. 79
Alena S Harley, et. al.Alena S Harley ... Eve Shinbrot
01 Jul 2019
Cancer Research | VOL. 79

Leveraging Spatial Variation in Tumor Purity for Improved Somatic Variant Calling of Archival Tumor Only Samples.
Rebecca F Halperin ... Winnie S Liang
Frontiers in Oncology | VOL. 9
Rebecca F Halperin, et. al.Rebecca F Halperin ... Winnie S Liang
20 Mar 2019
Frontiers in Oncology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 852: Improved tumor-only somatic variant calling using a gradient boosted machine learning algorithm

Abstract

Talk to us

Similar Papers

More From: Cancer Research