Abstract

Abstract Background: Compared to standard cancer screening paradigms, blood-based cancer screening test was able to identify asymptomatic cancer patients in a less invasive manner. However, it is an imperative but challenging attribute of a blood-based multi-cancer detection test to accurately localize the tissue of origin (TOO) to direct the diagnostic workup. In this study, we analyzed the somatic copy number aberrations (CNA) and a panel of cancer-associated plasma protein markers via artificial intelligence algorithms to recognize the TOO patterns for specific cancer types. Method: CNA mutational data were downloaded from TCGA SNP6.0 array level 3 dataset. In total, 2600 cancer tissue samples from 13 common cancer types were selected, including bladder, breast, cervical, colorectal, endometrial, gastric, liver, lung, lymphoma, ovarian, prostate, renal and thyroid. Human reference genome hg19 was segmented as individual 5 megabase (M)-length bins. Average log R ratio of each bin was calculated based on SNP6.0 array data. We randomly selected 200 samples from each cancer type as the training set, and the bin log R ratios of each cancer type were used to train the TOO classification algorithm through random forest method. The remaining samples were used as validation cohort. This process was repeated 10 times in order to generate steady results. Average prediction value from 10 repetitions was regarded as final prediction. The same method was applied to analyze seven plasma tumor protein expression data of 1005 cancer patient samples from 8 cancer types, including breast, colorectal, esophageal liver, lung, ovarian, pancreatic, and gastric cancer (Cohen, J. D. et al., Science, 2018). Results: By analyzing the CNA features of different cancer types, the overall accuracy of TOO classification was 69.4%. The accuracy in distinguishing individual cancer types varied with highest in ovarian (92.2%), followed by renal (84.7%), and thyroid (77.6%) cancer, and lowest in bladder (50.7%) and endometrial (42.6%) cancer. On the other hand, when distinctive protein expression traits were exploited, the overall accuracy of TOO classification was 61.7%, with highest in colorectal cancer (79.4%) . Conclusion: Different cancer types acquire distinctive genomic features which may have causal relationship with tumorigenesis and progression. This study indicates that genomic features such as CNA were able to predict cancer types with high accuracy, which warrants further study of this feature in circulating tumor DNA to trace the TOO in a blood-based cancer detection test. We acknowledge that our analysis was based on CNA profiles of tumor tissues, and blood-based CNA detection from cell-free DNA is dependent on fractional concentrations of tumor-derived DNA in blood. Thus it is crucial to estimate the CNA concordance between cancer tissue and blood samples in future studies. Citation Format: Shuaipeng Geng, Wei Wu, Shiyong Li, Mengna Zhang, Yan Chen, Mao Mao. Decoding tissue of origin patterns by tumor DNA and plasma tumor proteins [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-058.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call