Abstract Accurately predicting the tissue of origin (TOO) is required for blood-based multi-cancer screening to guide follow-up imaging and care when cancer is detected. However, precisely determining the TOO using cell-free DNA (cfDNA) methylation remains challenging due to low and variable concentrations of circulating tumor DNA and cfDNA and the complex, poorly characterized nature of tissue-specific methylation patterns. To address these challenges, we developed a transfer learning approach that leverages methylation profiles learned from tissue biopsy samples to predict the TOO of cfDNA cancer samples. Our model achieved 89% balanced accuracy (BA) in classifying the TOO across 10 cancer groups and outperformed models trained solely on plasma cfDNA data. Unmatched tissue biopsy (N=517) and blood (N=1,155, NCT05435066) samples were collected from treatment-naïve cancer patients across 21 tumor types consolidated into 17 cancer groups. All samples were analyzed using a custom targeted bisulfite sequencing hybrid capture assay. Five unique metrics were used to quantify informative methylation signal at each TOO region of interest. A feed-forward neural network was first trained on biopsy methylation features, where the cancer signal is strongest. Subsequently, the model was fine-tuned using plasma cfDNA methylation features, which have significantly lower signal-to-noise ratio, allowing the model to adapt its predictions to the noisier data. The Adam optimizer was employed for both training and fine-tuning, with early stopping implemented to prevent overfitting. Dropout layers were incorporated to enhance model generalization and performance was evaluated using 10-fold cross validation on plasma cfDNA samples across 10 cancer groups for which at least 25 plasma samples were available (N=1,041). The same architecture was trained exclusively on plasma cfDNA samples for comparison. Results are reported for samples with tumor content greater than 0.1% (N=415). Our TOO prediction achieved 73% BA in top-one predictions, where the true label matched the highest probability prediction, and 89% BA in top-two predictions, where the true label matched the highest or the second-highest probability prediction. Multi-class classification revealed our method significantly outperforms a model trained only on plasma cfDNA data. The plasma only model achieved 69% and 82% BA in top-one and top-two predictions, respectively. Accuracy per indication revealed that training on biopsy data significantly improves the detection of tumor types that shed DNA at low rates compared to training on plasma data alone (breast cancer +14%, prostate cancer +12%, and uterine cancer +20% points). In conclusion, our transfer learning approach, utilizing tissue biopsy methylation data, significantly improves cfDNA-based TOO detection accuracy. This approach highlights the capability of transfer learning to enable automated feature extraction from signal-rich data to enhance the analysis of more challenging, heterogeneous cfDNA samples, advancing non-invasive cancer diagnostics. Citation Format: Shiva Farashahi, Amirali Kia, Dorna Kashef, Esther Brown, Feras Hantash, Kieran I Chacko. Transfer learning for accurate tissue of origin classification from cfDNA methylation [abstract]. In: Proceedings of the AACR Special Conference: Liquid Biopsy: From Discovery to Clinical Implementation; 2024 Nov 13-16; San Diego, CA. Philadelphia (PA): AACR; Clin Cancer Res 2024;30(21_Suppl):Abstract nr PR019.
Read full abstract