Abstract

Recently, cancer sample clustering research based on gene expression data has been completely developed. Moreover, studies discover that other genomic data in TCGA besides gene expression data also contain features that can be utilized to cluster. Thus, by integrating these genomic data, new cancer clustering feature source can be formed. As a powerful subspace clustering method, Low-Rank Representation (LRR) has delivered an important breakthrough in clustering cancer samples. However, most methods based on LRR are only employed to analyze gene expression data, and cannot make full use of the characteristic information of other genomic data. Based on the LRR method, this paper proposes a novel Multi-Graph Laplacian regularized Low-Rank Representation (MGLLRR) method for cancer sample clustering using multi-omics datasets. To preserve the local geometry in genomic data, multi-graph regularization is led into MGLLRR method. The multi-graph Laplacian can fully preserve the hidden non-linear manifold structure in the data to make sure the smoothness of the integrated data along the estimated manifold. Considering the noise effect of different genomic data, we also introduce the idea of block constraint. We set each genome data as a data block and impose different constraint on it. Therefore, it can avoid the influence of different noise in multiple genomic data and improve the reliability of tumor clustering. The clustering experimental results indicate the effectiveness of MGLLRR on cancer sample clustering. And MGLLRR is a practical and effective analysis method of multiple genomic data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call