Abstract
BackgroundResearchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation.ResultsHetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.
Highlights
Researchers today are generating unprecedented amounts of biological data
The domain separation network extracts image representations into two subspaces: one private component, and another component shared by different domains
We implemented a similar idea in HetEnc to represent the gene expression, to show the platform-shared information by organizing different platforms’ data into the designated encoding networks
Summary
Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; is critical and challenging. The use of integrated analysis with multi-platform gene expression data in current biological research is increasing [1,2,3,4]. Genotype-Tissue Expression (GTEx), provided 1641 samples, covering multiple tissue or body sites, from 175 individuals [2]. These well-established and publicly available resources have provided a huge opportunity for developing integrative analysis approaches to gain more comprehensive insights. Handling multi-platform data effectively is quite challenging.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have