Abstract
Abstract Cancer of unknown primary (CUP) is a malignancy with poor prognosis and an unknown primary site and histologically unknown metastasis. Most patients receive empiric chemotherapy including platinum-taxane therapies but experience short survival times. Patients with poor prognosis CUP could benefit from optimizing drug therapy based on primary organ estimation. We constructed and evaluated an ensemble learning model to accurately determine the primary organ using methylation profiles of tumor tissues. Methylation data from 890 samples representing 10 types of cancer from TCGA were analyzed. After data preprocessing, we extracted the top 10,000 CpGs sites based on ANOVA and Gain Ratio or 100 CpG sites from a Gradient Boosting classifier. Performance was evaluated using several machine learning models. Unsupervised analysis was carried out to determine the relationships between the CpG sites selected by Gradient Boosting. Methylation profiling by ANOVA and Gain Ratio yielded favorable performance when using various machine learning models. Using gradient boosting as a feature selector reduced the number of CpG sites by 100-fold without compromising model performance. The training and validation sets showed favorable results for the classification of primary organs with ensemble models. In validation, classification accuracy was 91.2%, 93.5%, 89.7%, and 87.7% for Extreme Gradient Boosting, CatBoost, Random Forest, and Gradient Boosting, respectively. Further profiling of the selected methylation regions was correlated with cancer types and even revealed subgroups within breast and lung cancers. Gradient Boosting as a feature selector for DNA methylation profiling was highly effective in accurately determining tissue origin. Our study has outlined an approach whereby we used an embedded machine learning algorithm to identify a select set of informative features from complex high-dimension data to train and predict cancer type. Citation Format: Marco A. De Velasco, Kazuko Sskai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio. Machine learning-based classification of tissue origin of cancer using methylation profiles [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4331.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.