Abstract

BackgroundResearchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation.ResultsHetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.

Highlights

  • Researchers today are generating unprecedented amounts of biological data

  • The domain separation network extracts image representations into two subspaces: one private component, and another component shared by different domains

  • We implemented a similar idea in HetEnc to represent the gene expression, to show the platform-shared information by organizing different platforms’ data into the designated encoding networks

Read more

Summary

Introduction

Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; is critical and challenging. The use of integrated analysis with multi-platform gene expression data in current biological research is increasing [1,2,3,4]. Genotype-Tissue Expression (GTEx), provided 1641 samples, covering multiple tissue or body sites, from 175 individuals [2]. These well-established and publicly available resources have provided a huge opportunity for developing integrative analysis approaches to gain more comprehensive insights. Handling multi-platform data effectively is quite challenging.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call