Cancer subtype identification is to classify cancer into groups according to their molecular characteristics and clinical manifestations and is the basis for more personalized diagnosis and therapy. Public datasets such as The Cancer Genome Atlas (TCGA) have collected a massive number of multi-omics data. The accumulation of these datasets provides unprecedented opportunities to study the mechanism of cancers and further identify cancer subtypes at a comprehensive level. In this paper, we propose a multi-view robust graph-based clustering (MRGC) method to effectively identify cancer subtypes. Our method first learns robust latent representations from the raw omics data to alleviate the influences of the noise, where a set of similarity matrices are then adaptively learned based on these new representations. Finally, a global similarity graph is obtained by exploiting the consensus structure from the graphs. As a result, the three parts in our method can reinforce each other in a mutual iterative manner. We conduct extensive experiments on both generic machine learning datasets and cancer datasets. The experimental results confirm that our model can achieve satisfactory clustering performance compared to several state-of-the-art approaches. Moreover, we convey the practicability of MRGC by carrying out a case study on hepatocellular carcinoma.
Read full abstract