Abstract

3055 Background: Despite notable advances in cancer therapeutics, much of the mortality of human cancers results from late diagnosis. The early detection of cancer remains an unmet need that can benefit the effective treatments and better prognosis. Cell-free DNA (cfDNA) in the circulation provides an emerging diagnostic method for non-invasive cancer screening. The fragmentation signatures of cfDNA originating from different cell types or malignancies display distinct patterns, suggesting the potential as cancer biomarkers. This study aims to investigate the fragmentation profiles of cfDNA in several cancer types, and to develop a new approach for pan-cancer detection and tissue-of-origin identification. Methods: Plasma samples from 739 patients with lung (N=577), colorectal (N=98), breast (N=51) and liver (N=13) cancers, as well as 716 healthy individuals were collected in this study. Cell-free DNA was isolated and used to generate the whole-genome sequencing data. We developed an approach called PatternWGS (Pan-cancer detection and tissue-of-origin identification by whole-genome sequencing) to analyze the DNA fragmentation patterns comprehensively. Fragmentation features integrating cfDNA fragment size distribution, frequency of end motif sequence, and copy number variation were utilized to construct the optimal prediction model. We presented a multi-cancer detection model for distinguishing individuals with and without cancer using Generalized Linear Models (glmnet), and further built a cancer-origin model to predict cancer types using Random Forest algorithm. Training and testing occurred on randomly selected sampling splits of 60% and 40% of the data, respectively. Features selection and model construction were performed by only training cohort, and the test set was solely used for performance evaluation. Results: A larger proportion of shorter cfDNA fragments in cancers was observed. We proposed a fragment score indicating the degree of short fragmentation, and this fragment score was significantly higher in cancers (p<0.001). Several end motifs were identified to be specifically enriched in cancer patients. The multi-cancer detection model reached an area under the curve (AUC) of 0.98, with the sensitivity of 93% at 95% specificity. The sensitivity of detecting lung, colorectal, breast, and liver cancers reached 94.2%, 87%, 100% and 100% respectively, at 95% specificity. The cancer-origin model demonstrated an overall accuracy of 81.7% for the identification of tissue of origin. Conclusions: Our prediction model demonstrated a superior performance for multi-cancer detection and cancer origin prediction, suggesting the potential of plasma cfDNA fragmentation signatures as non-invasive biomarkers for early cancer detection in clinical. The distinct fragmentation patterns in cancer provide new insight to understand the mechanism of tumorigenesis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call