Abstract
Simple SummaryGene expression data from different cancer types offer the opportunity to identify cancer tissue-of-origin specific biomarkers and targets. In this study, we used pan-cancer gene expression data to train a deep learning neural network model to identify cancer tissue-of-origin specific gene expression signatures. We identified 976 genes that can reliably classify different cancer types with >97% accuracy.Cancer tissue-of-origin specific biomarkers are needed for effective diagnosis, monitoring, and treatment of cancers. In this study, we analyzed transcriptomics data from 37 cancer types provided by The Cancer Genome Atlas (TCGA) to identify cancer tissue-of-origin specific gene expression signatures. We developed a deep neural network model to classify cancers based on gene expression data. The model achieved a predictive accuracy of >97% across cancer types indicating the presence of distinct cancer tissue-of-origin specific gene expression signatures. We interpreted the model using Shapley additive explanations to identify specific gene signatures that significantly contributed to cancer-type classification. We evaluated the model and the validity of gene signatures using an independent test data set from the International Cancer Genome Consortium. In conclusion, we present a robust neural network model for accurate classification of cancers based on gene expression data and also provide a list of gene signatures that are valuable for developing biomarker panels for determining cancer tissue-of-origin. These gene signatures serve as valuable biomarkers for determining tissue-of-origin for cancers of unknown primary.
Highlights
The identities and phenotypes of different cell types are governed by underlying gene expression pattern
We developed a model that classified different cancer types based on transcriptomic data with >97% accuracy
skin cutaneous melanoma (SKCM) and uveal melanoma (UVM) formed adjacent but well-separated clusters. These results demonstrate that our deep neural network (DNN) model captures cancer tissue-of-origin specific gene expression signatures through enhanced pattern recognition
Summary
The identities and phenotypes of different cell types are governed by underlying gene expression pattern. While all cells contain the same genetic information, only a subset of genes are expressed in a given cell type. Identification of unique gene expression signatures associated with different cancers are valuable as diagnostic biomarkers and therapeutic targets. The expression of prostate-specific antigen (PSA) is elevated in prostate cancer patients [1]. Identification of such signatures requires pan-cancer studies that investigate gene expression pattern associated with different cancer types. Until recently, a large number of gene expression profiles from multiple cancer types were not available. The advent of high-throughput sequencing methods has revolutionized the field of cancer genomics and transcriptomics studies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.