e13606 Background: The advent of next-generation sequencing has greatly facilitated physicians in recommending tailored treatments for cancer patients. Despite these advancements, the dependence on the primary tumor site remains crucial for accurate diagnoses, posing a substantial obstacle in cases of carcinomas of unknown primary (CUPs) and hindering optimal patient care. To address this challenge, we propose the implementation of a machine learning model capable of identifying the tissue of origin (TOO) across 16 cancer types. Current standards gravitate to the use of tumor tissue for identifying primary tumor site. Leveraging the advantages of cell-free DNA (cfDNA), our approach provides a non-invasive and highly sensitive alternative to traditional biopsy or radiography methods. Methods: Cell-free DNA was extracted from patients diagnosed with carcinoma of known origin. The model underwent training using a dataset comprising 2,793 cfDNA samples, while the validation set consisted of 1,863 samples. The training and validation cohorts exhibited comparable demographics in terms of sex, age, and cancer types. Employing low-coverage whole genome sequencing (WGS), samples were analyzed to capture cfDNA features, including DNA fragment size distribution, copy number variation, nucleosome coverage pattern, mutational signature, and hotspot regions associated with cancer-related mutations. The final prediction model's accuracy was evaluated based on top 1 and top 2 predictions in accordance with the true tumor site of origin. Results: Overall accuracy of TOO predictions in the training cohort is 80.20% (2240/2793) for top 1 site and 90.26% (2521/2793) for top 2 sites. Overall accuracy of TOO predictions in the validation cohort is 81.80% (1524/1863) for top 1 site and 90.07% (1678/1863) for top 2 sites. Accuracy of tumor site prediction in the validation cohort is highest for lung cancer (Top 1 = 95.9%; Top 2 = 98.0%) tumor sites and lowest for biliary system and kidney tumor types (Top 1 = 54.3% & 57.4%; Top 2 = 71.6% & 63.2%). In the clinical setting, TOO estimation of 12/16 patients with CUP were identified to be in concordance with majority pathological consensus. Of note, two patients with CUP were diagnosed as lung cancer patients through TOO estimation. Molecular biomarkers of lung adenocarcinoma were also identified through targeted sequencing which corroborated the diagnosis, and chemoradiotherapy was recommended to the patient. Conclusions: We developed an accurate predictor of TOO by leveraging machine learning algorithm and incorporating cfDNA mutational, fragmentomic, and epigenetic features. Our results demonstrate high concordance between model-predicted and actual tumor sites across all examined cancer types. This model shows promise in offering valuable insights for the identification of cancers with an unknown primary origin using non-invasive cell free DNA samples.
Read full abstract