Convolutional neural network models for cancer type prediction based on gene expression

Milad Mostavi,Yufei Huang,Yu-Chiao Chiu,Yidong Chen

doi:10.1186/s12920-020-0677-2

Abstract

BackgroundPrecise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers.ResultsIn this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9–95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction.ConclusionsHere we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future.

Highlights

Precise prediction of cancer types is vital for cancer diagnosis and therapy
Utilizing the entire collection of The Cancer Genome Atlas (TCGA) gene expression data sets, covering all 33 cancer types and nearly 700 normal samples from various tissues of origin, we examined the accuracies of tumor type prediction before and after removing the influence of tissuespecific genes’ expression
The input for 1D-Convolutional Neural Network (CNN) (Fig. 1a) is a 1D vector following gene symbol’s alphabetic order, while inputs for 2D-VanillaCNN and 2D-Hybrid-CNN (Fig. 1b,c) models were reshaped to 100 rows by 71 columns matrix

Summary

Introduction

Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Instead of using transcriptomic data, DeepCNA [13], a CNN based classifier, utilized ~ 15,000 samples with copy number aberrations (CNAs) from COSMICS [14] and the HiC data from 2 human celllines and achieved an accuracy ~ 60% to discern 25 cancer types. While all these attempts achieved high accuracy to some extent, these methods ignore the existence of tissue of origin within each cancer type. None of these studies systematically evaluated different CNN model constructions and their impact on the classification accuracy

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: Apr 1, 2020
Citations: 123	License type: open-access

R Discovery Prime

R Discovery Prime

Convolutional neural network models for cancer type prediction based on gene expression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Artificial intelligence: finding the intersection of predictive modeling and clinical utility
Karthik Ravi
Gastrointestinal endoscopy | VOL. 93
Karthik RaviKarthik Ravi
07 Mar 2021
Gastrointestinal endoscopy | VOL. 93

Public Perception Analysis of Tweets During the 2015 Measles Outbreak: Comparative Study Using Convolutional Neural Network Models.
Jingcheng Du ... Hsing-Yi Song
Journal of Medical Internet Research | VOL. 20
Jingcheng Du, et. al.Jingcheng Du ... Hsing-Yi Song
09 Jul 2018
Journal of Medical Internet Research | VOL. 20

Hyperspectral signature-band extraction and learning: an example of sugar content prediction of Syzygium samarangense
Yung-Jhe Yan ... Mang Ou-Yang
Scientific Reports | VOL. 13
Yung-Jhe Yan, et. al.Yung-Jhe Yan ... Mang Ou-Yang
12 Sep 2023
Scientific Reports | VOL. 13

Deep learning-based classification and mutation prediction from histopathological images of hepatocellular carcinoma.
Haotian Liao ... Zhenru Wu
Clinical and Translational Medicine | VOL. 10
Haotian Liao, et. al.Haotian Liao ... Zhenru Wu
01 Jun 2020
Clinical and Translational Medicine | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convolutional neural network models for cancer type prediction based on gene expression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics