Abstract Lung cancer is the second most prevalent cancer, and the leading cause of cancer-related deaths, in the US. Since the survival rate of patients diagnosed with lung cancer at earlier stages is dramatically higher than those diagnosed at later stages, technologies that shift lung cancer diagnosis to an earlier stage would be expected to improve survival rates. Fragmentomics, the study of short circulating cell-free DNA (cfDNA) fragments in the blood, is an emerging field in cancer diagnostics that has the potential to catalyze such a shift. Studies have shown that fragmentomics features, such as fragment sizes and end motifs, can be used to discriminate normal cfDNA from circulating tumor DNA (ctDNA) generated through tumor necrosis and apoptosis. Using this knowledge, machine learning tools, trained to identify ctDNA-specific features, are being leveraged in novel liquid biopsy assays to improve early-stage lung cancer detection. To create a machine learning classifier that detects discrete signals of ctDNA in individuals with lung cancer, Genece Health utilized low pass (∼3x coverage) whole genome sequencing (Illumina) of plasma extracted cfDNA from >400 lung cancer samples and >1,000 negative controls. This data was used to generate our novel FEMS (Fragment End Motifs and Sizes) dataframe, containing >30,000 features, by combining fragment lengths with their corresponding 5’ 4bp end motifs. The FEMS dataframe was then used to train a neural network machine learning classifier. To evaluate the performance of our machine learning classifier, we investigated whether the classifier scores correlated with well-known hallmarks of ctDNA. First, consistent with published ctDNA characteristics, we observed significant (p<0.01) enrichment of fragment sizes ≤167bp in lung cancer samples. Concordantly, our classifier scores are significantly correlated with this fragment size enrichment. Also, both fragment size enrichment and our classifier scores are correlated with lung cancer stage, with higher enrichment and higher scores observed in later stages. Next, we show that our classifier scores are correlated with specific end motif enrichments or depletions in lung cancer samples. For example, in lung cancer samples, CCCA end motifs were significantly enriched (p<0.01) in fragment sizes 120-140bp (max at 128bp), while significantly depleted in fragment sizes 190-210bp (min at 204bp). Once again, our classifier scores were significantly correlated with both the enrichment of this end motif in short fragments and its depletion in long fragments. Last, we assessed the classifier’s performance in an independent cohort (46 lung cancer and 132 non-cancer samples) and show 89.2% sensitivity with 90.4% specificity (auROC=0.957). The correlation observed between our classifier scores and published ctDNA characteristics, combined with the strong performance in an independent cohort, suggests that Genece’s novel machine learning classifier is effectively leveraging ctDNA-specific fragmentomics features for the identification of lung cancer in patient samples. Citation Format: Carlos Guzman, Michael L Salmans, Mengchi Wang, Byung In Lee, Andrew R Carson. Novel machine learning lung cancer classifier shows significant correlation with cancer-specific fragmentomics features [abstract]. In: Proceedings of the AACR Special Conference: Liquid Biopsy: From Discovery to Clinical Implementation; 2024 Nov 13-16; San Diego, CA. Philadelphia (PA): AACR; Clin Cancer Res 2024;30(21_Suppl):Abstract nr B058.
Read full abstract