Abstract

BackgroundCancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Many computational methods have been proposed to classify cancer subtypes, however most of them generate the model by only employing gene expression data. It has been shown that integration of multi-omics data contributes to cancer subtype classification.ResultsA new hierarchical integration deep flexible neural forest framework is proposed to integrate multi-omics data for cancer subtype classification named as HI-DFNForest. Stacked autoencoder (SAE) is used to learn high-level representations in each omics data, then the complex representations are learned by integrating all learned representations into a layer of autoencoder. Final learned data representations (from the stacked autoencoder) are used to classify patients into different cancer subtypes using deep flexible neural forest (DFNForest) model.Cancer subtype classification is verified on BRCA, GBM and OV data sets from TCGA by integrating gene expression, miRNA expression and DNA methylation data. These results demonstrated that integrating multiple omics data improves the accuracy of cancer subtype classification than only using gene expression data and the proposed framework has achieved better performance compared with other conventional methods.ConclusionThe new hierarchical integration deep flexible neural forest framework(HI-DFNForest) is an effective method to integrate multi-omics data to classify cancer subtypes.

Highlights

  • Cancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer

  • Due to the heterogeneity of cancer, it has been proved that integration of multi-omics data has an effect on cancer subtype classification

  • Cancer subtype classification is verified on breast invasive carcinoma (BRCA), glioblastoma multiforme (GBM) and ovarian cancer (OV) data sets from The Cancer Genome Atlas (TCGA) by integrating gene expression, miRNA expression and DNA methylation data

Read more

Summary

Introduction

Cancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. It has been shown that integration of multi-omics data contributes to cancer subtype classification. The Cancer Genome Atlas (TCGA) [12, 13] project produced different kinds of genome, transcriptome and epigenome information for more than 1100 patient samples from more than 34 cancer types [14]. These sequencing data provide an unprecedented opportunity to study cancer subtype at the molecular level by using multi-omics data [15, 16].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call