Abstract
Abstract The identification of cancer subtypes is vital to advance the precision of cancer disease diagnosis and therapy. Several works had been done to integrate multiple types of genomics data to investigate cancer subtypes. However, (1) few of them particularly considered the intrinsic correlations in each type of data; (2) to the best of our knowledge, none of them considered transcriptome alternative splicing regulation in data integration. It has been demonstrated that many cancers are related to abnormal alternative splicing regulations in recent years. In this paper, we propose a hierarchical deep learning framework, HI-SAE, to integrate gene expression and transcriptome alternative splicing profiles data to identify cancer subtypes. We adopt the stacked autoencoder (SAE) neural network to learn high-level representations in each type of data, respectively, and then integrate all the learned high-level representations by another learning layer to learn more complex data representations. Based on the final learned data representations, we cluster patients into different cancer subtype groups. Comprehensive experiments based on TCGA breast cancer data demonstrate that our model provides an effective and useful approach to integrate multiple types of transcriptomics data to identify cancer subtypes and the transcriptome alternative splicing data offers distinguishable clues of cancer subtypes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have