Abstract Fragmentomics has emerged as a powerful tool in the detection of early cancer malignancies. One of the challenges for fragmentomics is the low signal to noise ratio in early stage cancer samples. Fragmentomic sequencing data is also inherently noisy and susceptible to batch effects. Common strategies to address these challenges include using normalization methods, such as frequency or Z-score normalization, to scale data. However, these existing methods struggle to differentiate true signals from noise and are susceptible to experimental bias. To address the need for a strategy that removes noise and batch effects, Genece Health is employing a novel normalization technique, the REFINE method, to improve the results of fragmentomic cancer detection. Whole genome sequencing (Illumina) fragmentomic data were generated at an average 2.5x coverage from 3,669 normal and 1,346 cancer samples. Six different cancer types were represented: colorectal, esophageal, liver, lung, ovarian, and pancreatic. Fragment end-motifs and fragment sizes (FEMS) data were then extracted from the sequence data. REFINE first establishes a baseline FEMS signal from a representative and random panel of normals (PoN) comprised of 100 healthy samples. This baseline, which represents the normal background FEMS noise in healthy donors, is subsequently decomposed using truncated singular value decomposition. Next, REFINE then eliminates this decomposed baseline FEMS signal from the raw FEMS data in all other samples using linear regression. A convolutional neural network (CNN) was trained (2,206 normal and 1,000 cancer samples) using frequency normalized data versus REFINE normalized data using 5-fold cross validation. The performances of both models were then assessed using the remaining 1,463 normal with 346 cancer samples. With a training dataset specificity threshold of 90%, REFINE significantly improves the sensitivity performance of the CNN model >30% (from 50.4% to 83.0%). REFINE also improves the auROC from 0.753 to 0.944. We also show that increasing the size of the PoN to 812 samples and including hyper-parameter optimizations further improves the sensitivity to 85.9% (with an auROC of 0.955). The REFINE method demonstrates a dramatic enhancement in the detection of ctDNA from fragmentomic data, overcoming the limitations of conventional normalization techniques. REFINE effectively enhances the signal to noise ratio, facilitating more accurate cancer detection. Future work will focus on validating the REFINE method across diverse datasets and exploring its application in other areas of oncology. Citation Format: Mengchi Wang, Jin Mo Ahn, Junnam Lee, Dasom Kim, Eun-Hae Cho, Byung In Lee, Andrew Carson. REFINE Method: Novel strategy for signal enhancement [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4928.
Read full abstract