A snapshot neural ensemble method for cancer-type prediction based on copy number variations

Md. Rezaul Karim,Stefan Decker,João Bosco Jares,Ashiqur Rahman,Oya Beyan

doi:10.1007/s00521-019-04616-9

Md. Rezaul Karim, Stefan Decker + Show 3 more

Open Access

https://doi.org/10.1007/s00521-019-04616-9

Copy DOI

Abstract

An accurate diagnosis and prognosis for cancer are specific to patients with particular cancer types and molecular traits, which needs to address carefully. The discovery of important biomarkers is becoming an important step toward understanding the molecular mechanisms of carcinogenesis in which genomics data and clinical outcomes need to be analyzed before making any clinical decision. Copy number variations (CNVs) are found to be associated with the risk of individual cancers and hence can be used to reveal genetic predispositions before cancer develops. In this paper, we collect the CNVs data about 8000 cancer patients covering 14 different cancer types from The Cancer Genome Atlas. Then, two different sparse representations of CNVs based on 578 oncogenes and 20,308 protein-coding genes, including genomic deletions and duplication across the samples, are prepared. Then, we train Conv-LSTM and convolutional autoencoder (CAE) networks using both representations and create snapshot models. While the Conv-LSTM can capture locally and globally important features, CAE can utilize unsupervised pretraining to initialize the weights in the subsequent convolutional layers against the sparsity. Model averaging ensemble (MAE) is then applied to combine the snapshot models in order to make a single prediction. Finally, we identify most significant CNVs biomarkers using guided-gradient class activation map plus (GradCAM++) and rank top genes for different cancer types. Results covering several experiments show fairly high prediction accuracies for the majority of cancer types. In particular, using protein-coding genes, Conv-LSTM and CAE networks can predict cancer types correctly at least 72.96% and 76.77% of the cases, respectively. Contrarily, using oncogenes gives moderately higher accuracies of 74.25% and 78.32%, whereas the snapshot model based on MAE shows overall 2.5% of accuracy improvement.

Highlights

Cancer results from highly expressed genes due to mutations or alterations in gene regulations that control cell division and cell growth
Using MSeq-Copy number variations (CNVs), we selected a fixed number of genes and extracted the copy numbers (CNs) that overlapped with the gene locations, removing them from the protein noncoding gene because arguably more than 80% of human genes do not encode any protein, i.e., CNs from these regions have little-to-no effect on the tumor growth
The second LSTM layer emits an output ‘H,’ which is reshaped into a feature sequence to feed into fully connected layers to predict the cancer types at the timestep dimension, this helps produce a sequence vector from the last LSTM layer, which will hopefully force the CNVs of specific genes that are highly indicative of being responsible for specific cancer type

Summary

Introduction

Cancer results from highly expressed genes due to mutations or alterations in gene regulations that control cell division and cell growth. The significance is not fully understood, it is likely that CNVs are responsible for a considerable proportion of phenotypic variation [39] Such variations may lead to changes in gene dosage and expression [12]. CNVs are hypothesized to be of functional significance These changes in GE are responsible for different phenotypic variations or diseases (e.g., disabilities, diabetes, schizophrenia, cancer, and obesity) or envisaged to be associated with other diseases, e.g., autism spectrum disorder [4, 34, 37]. The extracted CNVs data were used to train machine learning (ML) models for cancer identification and type prediction. These approaches, are not capable of simultaneous analysis of multiple samples and recurrent CNVs [32].

Related works

Data collection

Data preprocessing

Feature extraction based on protein-coding genes

Feature extraction based on oncogenes

Network constructions and training

Conv-LSTM network

Convolutional autoencoder classifier

Ensemble of classifiers

Networks training

Finding and validating important biomarkers

Hyperparameter tuning

Experiment results

Experiment setup

Performance analysis of individual model

Performance analysis of the ensemble model

Validation of the top biomarkers

Analysis of the common biomarkers

Comparisons with related works

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neural Computing & Applications	Publication Date: Nov 30, 2019
Citations: 19	License type: open-access

R Discovery Prime

R Discovery Prime

A snapshot neural ensemble method for cancer-type prediction based on copy number variations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural Computing & Applications

Lead the way for us

Similar Papers

Pan-cancer analysis of non-coding transcripts reveals the prognostic onco-lncRNA HOXA10-AS in gliomas.
Keren Isaev ... Peter B Dirks
Cell Reports | VOL. 37
Keren Isaev, et. al.Keren Isaev ... Peter B Dirks
01 Oct 2021
Cell Reports | VOL. 37

Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
Xingyu Zheng ... Christopher I Amos
BMC bioinformatics | VOL. 21
Xingyu Zheng, et. al.Xingyu Zheng ... Christopher I Amos
20 Oct 2020
BMC bioinformatics | VOL. 21

Abstract 4741: Improving pancreatic cancer drug discovery by leveraging genomics to select better in vitro models
Yoonjeong Cha ... Andrew Lysaght
American Journal of Cancer | VOL. 75
Yoonjeong Cha, et. al.Yoonjeong Cha ... Andrew Lysaght
01 Aug 2015
Abstract 4741: Improving pancreatic cancer drug discovery by leveraging genomics to select better in vitro models
Yoonjeong Cha ... Andrew Lysaght

Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders
Rui Luo ... Daniel H Geschwind
American Journal of Human Genetics | VOL. 91
Rui Luo, et. al.Rui Luo ... Daniel H Geschwind
21 Jun 2012
American Journal of Human Genetics | VOL. 91

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A snapshot neural ensemble method for cancer-type prediction based on copy number variations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural Computing & Applications