Abstract

109 Background: Carcinoma of unknown primary (CUP) accounts for up to 5% of all cancer cases and presents challenges in identifying primary cancer sites and successful treatment. DNA methylation abnormalities on the 5'-cytosine-phosphate-guanine-3' (CpG) motif across the genome are associated with carcinogenesis, enabling their use as cancer biomarkers for early diagnosis and tissue origin prediction. This study proposes a Combined Approach for 1) CpG site selection, 2) predicting cancer origin and identifying biomarkers, and 3) developing an open-source application for user-friendly data analysis. Methods: To emulate the state-like progression of cancer and accommodate missing values, discretized methylation beta values and missing values were tokenized. Feature selection using L1 regularization of lambda 0.0003 resulted in 152 CpG sites out of an initial set of 312,792. Three independent sets were selected for each of the 3 self-attention type: vanilla scaled dot-product, dense synthesizer, and factorized dense synthesizer. Self-attention enhances the biomarker discovery approach toward personalization as it generates a unique attention map for each sample input that highlights essential features for primary site prediction. A permutation test further evaluated contribution by the selected CpG sites, confirming biomarkers identification for each primary site. An open-source application was developed to predict cancer origin using methylation beta values, providing a user-friendly interface displaying predicted primary sites and corresponding percentages. Results: Independent sets of selected CpG performs comparably to the initial set with robustness (most well above 97% in precision, recall, and F1) across different clinical (sample type: normal, primary, metastatic, recurrent; and AJCC pathologic stage) and demographic groups (age, gender, race). The model identifies important biomarkers previously shown in literatures i.e. GATA4 and HOXD. Conclusions: This study offers valuable insights into feature selection, primary site prediction, and biomarker discovery using DNA methylation data, with potential practical applications for healthcare facilities and in personalized cancer treatment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.