Abstract

Abstract Blood-based liquid biopsies are becoming the standard for noninvasive profiling of tumors from circulating tumor DNA (ctDNA) to facilitate treatment decisions. However, it is known that non-tumor-derived mutations, such as those from clonal hematopoietic (CH) lineage, are also present in cell-free DNA (cfDNA) and can complicate tumor mutation detection and interpretation. Filtering alterations associated with CH can reduce the risk of false positives in ctDNA analysis. Current strategies for filtering CH mutations are based on prior knowledge of those in canonical CH genes, sequencing a matched peripheral blood mononuclear cell sample, or using multiple time points to evaluate changes in allele frequency (AF), with the assumption that AF of CH of indeterminate potential (CHIPs) are unlikely to be impacted by treatments; however, these approaches are either insufficient or cost prohibited. We present a machine learning framework that incorporates prior knowledge from tumor and blood-derived mutations to improve the classification of CH and tumor variants from plasma-only cfDNA samples and that greatly outperforms a simple rule-based approach. We developed a classification framework composed of three main modules to maximize the information gain from ctDNA as non-ctDNA samples: (1) a sequence context-based model where each variant is represented by 128 mutational signatures extracted using a bag-of-words-like modeling (using normal and tumor samples) along with mutational function effect scores from SnpSift and cancer type origin. In total 6,745 CH and 57,210 somatic tumor-derived mutations are used to train a model to distinguish CH, CH-putative cancer driver, and tumor mutations; (2) a ctDNA-context model that includes patient-level information (age, mutational signatures, and variant allele frequencies) from 392 tumor and 481 CH blood-derived mutations extracted exclusively from ctDNA samples to predict blood or tumor origin; and (3) a meta-classifier where models 1 and 2 are aggregated into a single score for predicting the origin of a given mutation. We compared our models against a baseline strategy where mutations were predicted as CH if they fell into a list of 27 canonical CH-related genes. Our proposed metaclassifier that integrates sequence- and ctDNA-based features improved the overall performance of CH detection (AUC = 0.83, compared with the baseline model, AUC = 0.72), highlighting the value of adding ctDNA-derived and mutation-context features to profile somatic mutations. As ctDNA assays and technologies rapidly progress and gain adoption for precision oncology, we anticipate that our work, combined with existing benchmarking efforts, will enable more robust analyses on critical applications such as early disease detection, minimal residual disease, and mutational profiling to ultimately best inform clinical decision-making. Citation Format: Gustavo Arango-Argoty, Gerald Sun, Aleksandra Markovets, Carl Barrett, Zhongwu Lai, Etai Jacob. Improved identification of CHIP mutations from cell-free DNA without matched normal samples using machine learning. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5360.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call