Abstract

The integration of multi-omics data is suitable for early detection and is also significant to a wide variety of cancer detection and treatment fields. Accurate prediction of survival in cancer patients remains a challenge due to the ever-increasing heterogeneity and complexity of cancer. The latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Recently, many studies have shown to extract biologically relevant latent features to learn the complexity of cancer by taking advantage of deep learning. In this paper, we propose a Shared representation learning method by employing the Autoencoder structure for Multi-Omics (SAMO) data, which is inspired by the recent success of variational autoencoders to extract biologically relevant features. Variational autoencoders are a deep neural network approach capable of generating meaningful latent spaces. We address the problem of losing information when integrating multiple data sources. We formulate a distributed latent space jointly learned by separated variational autoencoders on each data source in an unsupervised manner. Firstly, we pre-trained the variational autoencoders separately, which produce shared latent representations. Secondly, we fine-tuned only the encoders and latent representations with a supervised classifier for the prediction task. Here, we used a lung cancer multi-omics data combined illumina human methylation 27 K and gene expression RNA seq. datasets from The Cancer Genome Atlas (TCGA) data portal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call