Abstract

BackgroundThe state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function.ResultsWe trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors.ConclusionThis work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.

Highlights

  • The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large

  • We developed the CancerSiamese, an Siamese convolutional neural networks (SCNNs) model that contains two identical 1D-convolutional neural networks (CNNs), which learn cancer type representations of query and support samples, followed by a metric-learning layer to predict if the representations from the query and support sample are similar or not

  • CancerSiamese networks were trained on the Cancer Genome Atlas (TCGA) and MET500 metastatic cancer cohort (MET500) training datasets separately with Keras deep learning (DL) platform with the Tensorflow backend [24]

Read more

Summary

Introduction

The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. We hypothesize the existence of a set of type-agnostic expression representations that define the simi‐ larity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. It becomes increasingly clear that as much as molecular profiles can accurately predict current cancer types, the spectrum of cancer transcends existing tumor lineages, underscoring the need for a molecular-based classification of individual tumors. This emergent perspective of cancer fosters a more effective “precision cancer therapy," which advocates specialized diagnosis and treatments based on individual patients’ molecular makeup [5]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.