Abstract

Clustering of tumor samples can help identify cancer types and discover new cancer subtypes, which is essential for effective cancer treatment. Although many traditional clustering methods have been proposed for tumor sample clustering, advanced algorithms with better performance are still needed. Low-rank subspace clustering is a popular algorithm in recent years. In this paper, we propose a novel one-step robust low-rank subspace segmentation method (ORLRS) for clustering the tumor sample. For a gene expression data set, we seek its lowest rank representation matrix and the noise matrix. By imposing the discrete constraint on the low-rank matrix, without performing spectral clustering, ORLRS learns the cluster indicators of subspaces directly, i.e., performing the clustering task in one step. To improve the robustness of the method, capped norm is adopted to remove the extreme data outliers in the noise matrix. Furthermore, we conduct an efficient solution to solve the problem of ORLRS. Experiments on several tumor gene expression data demonstrate the effectiveness of ORLRS.

Highlights

  • Tumor is a group of cells that have undergone unregulated growth and often form a mass or lump

  • Many traditional clustering methods, such as hierarchical clustering (HC) [12, 13], self-organizing maps (SOM) [14], nonnegative matrix factorization (NMF) [15, 16], and principal component analysis (PCA) [17,18,19,20] have been used for gene expression data clustering. e gene expression data often contains structures that can be represented and processed by some parametric models. e linear subspaces are possible to characterize a given set of data since they are easy to calculate and often effective in real applications. e subspace methods, such as NMF, are essentially based on the assumption that the data is approximately drawn from a low-dimensional subspace

  • A novel one-step robust low-rank subspace clustering method (ORLRS) is proposed for tumor clustering, where the gene expression data set is represented by a low-rank matrix and a noise matrix

Read more

Summary

Introduction

Tumor is a group of cells that have undergone unregulated growth and often form a mass or lump. E subspace methods, such as NMF, are essentially based on the assumption that the data is approximately drawn from a low-dimensional subspace. In recent years, these methods have been gaining much attention. Yu et al proposed a correntropy-based hypergraph regularized NMF (CHNMF) method for clustering and feature selection [21]. Jiao et al proposed a hypergraph regularized constrained nonnegative matrix factorization (HCNMF) method for selecting differentially expressed genes and tumor sample classification [22]. A nonnegative matrix factorization framework based on multisubspace cell similarity learning for unsupervised scRNA-seq data analysis (MscNMF) was proposed by Wang et al [23]. MscNMF can learn the gene features and cell features of different subspaces, and the correlation and Computational Intelligence and Neuroscience heterogeneity between cells will be more prominent in multisubspaces, resulting in the final cell similarity learning will be more satisfactory

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.