Abstract

Breast cancer subtype attains the great importance for disease diagnosis and individualized patient therapy. With the advance of the high-throughput sequencing techniques, huge and various types of genetic data have been produced. It has been shown that integration of multi-omics data contributes to cancer subtype identification. However, most of the existing methods used only gene expression to identify cancer subtypes; another major issue is that most of existing clustering methods completely ignore results from prior knowledge. In this paper, a new deep learning fusion clustering framework is proposed to integrate multi-omics data (mRNA expression, miRNA expression and DNA methylation) on the TCGA BRAC dataset for breast cancer subtype identification named as DLFC. Stacked autoencoder (SAE) and autoencoder (AE) are used to learn high-level data representations. Prior biological knowledge is used to guide the representation learning. The final learned high-level data representations is used as input to the clustering model for cancer subtype identification. The new deep learning fusion clustering framework is an effective method to integrate increasingly complex multi-omics data to identify breast cancer subtypes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call