Abstract

e13597 Background: Accurate identification of tumor origin is crucial for effective diagnosis and treatment, particularly in cases of metastatic tumors. Interpretable machine learning models have displayed significant potential in addressing this problem when imaging and immunohistochemistry (IHC) examinations are ineffective. In this study, we developed a panel using gene expression profiles from various tumors and constructed a robust machine learning model for precise identification of tumor origin. Methods: RNA sequencing (RNA-seq) data of 9462 tumor samples originating from 21 different organs were collected from The Cancer Genome Atlas (TCGA). We conducted feature engineering through unsupervised clustering and differential gene expression analysis, selecting a refined panel of 164 genes from a pool of over 60,000 identifiers. Subsequently, a machine learning classifier grounded in Logistic regression (LR) was trained on 9462 samples with the constructed 164-gene panel. To enhance the adaptability of our model to this task, 10-fold cross-validation was employed in the multi-class mode. Two independent test sets, the Primary tumor set (PT, n=3420, including 19 tumor types) and the Metastatic tumor set (MTP, n=100, all originating from the prostate and spread to bone, liver, etc.), were established with samples from Gene Expression Omnibus (GEO) and other published studies. Notably, all samples underwent FPKM normalization and there was no overlap between training and testing samples. Model performance was assessed using accuracy, specificity, and sensitivity metrics. Results: The 164-gene expression panel achieved a cross-validation accuracy of 96.73%. In the assessment of the PT test set, our model achieved an overall 99.62% specificity, 92.31% accuracy and 89.79% sensitivity, exhibiting performance comparable to similar models in other studies. Additionally, our model accurately traced 91% of metastatic tumors in the MTP test set to the prostate, surpassing previous lines of work by a large margin. Conclusions: Gene expression patterns reveal organ-specific characteristics that could be used to identify tumor origin. The combination of a condensed yet comprehensive gene expression panel with a robust machine learning model serves as a promising tool for tumor diagnosis. Ongoing correlative studies aim to extend predictions from tissue organs to cancer subtypes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.