Abstract
BackgroundBlood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer.MethodsWhole-genome sequencing was performed on cfDNA extracted from plasma samples (N = 546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance.ResultsIn a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91–0.93) with a mean sensitivity of 85% (95% CI 83–86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance.ConclusionsA machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.
Highlights
Blood-based methods using cell-free Deoxyribonucleic acid (DNA) are under development as an alternative to existing screening tests
Paired-end whole-genome sequencing (WGS) was performed on plasma cell-free DNA (cfDNA) obtained from 271 non-cancer control subjects and 546 colorectal cancer (CRC) patients (Table 1)
We have demonstrated that it is possible to take an MLbased approach to learn the relationship between a patient’s cfDNA profile and cancer diagnosis, with 85% sensitivity at 85% specificity in CRC using standard kfold cross-validation; application of other rigorous and novel CV strategies designed to control for known confounding variables yielded 71–85% sensitivity at 85% specificity
Summary
Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. Early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. Blood-based screening tests for cancer have been proposed in an effort to address some of the aforementioned challenges. One key area of both academic and commercial interest is circulating cell-free DNA (cfDNA), which includes both tumor-derived DNA (socalled “circulating tumor DNA”, or ctDNA) and DNA derived from non-tumor cells, such as hematopoietic and stromal cells, to supplement or replace existing cancer screening methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.