Abstract

BackgroundIdentifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. Clustering cancer gene expression data from multiple cancers to their own class is a significance solution. However, the characteristics of high-dimensional and small samples of gene expression data and the noise of the data make data mining and research difficult. Although there are many effective and feasible methods to deal with this problem, the possibility remains that these methods are flawed.ResultsIn this paper, we propose the graph regularized low-rank representation under symmetric and sparse constraints (sgLRR) method in which we introduce graph regularization based on manifold learning and symmetric sparse constraints into the traditional low-rank representation (LRR). For the sgLRR method, by means of symmetric constraint and sparse constraint, the effect of raw data noise on low-rank representation is alleviated. Further, sgLRR method preserves the important intrinsic local geometrical structures of the raw data by introducing graph regularization. We apply this method to cluster multi-cancer samples based on gene expression data, which improves the clustering quality. First, the gene expression data are decomposed by sgLRR method. And, a lowest rank representation matrix is obtained, which is symmetric and sparse. Then, an affinity matrix is constructed to perform the multi-cancer sample clustering by using a spectral clustering algorithm, i.e., normalized cuts (Ncuts). Finally, the multi-cancer samples clustering is completed.ConclusionsA series of comparative experiments demonstrate that the sgLRR method based on low rank representation has a great advantage and remarkable performance in the clustering of multi-cancer samples.

Highlights

  • Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research

  • Motivated by the above methods, in order to obtain a better lowest rank matrix that can avoid the simple symmetric operation and preserve the intrinsic local geometrical structures within the raw high-dimensional dataset, we introduce symmetric sparse constraints and graph regularization based on manifold learning into the LRR method, and propose the graph regularized low-rank representation method under combined the sparse and symmetric constraints, or short sgLRR method

  • In this paper, we introduce graph regularization based on manifold learning and symmetric sparse constraints into the original LRR and propose a novel method called the sgLRR

Read more

Summary

Introduction

Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. The researchers have proposed many well-performing methods and used them for gene expression data mining, such as K-means clustering [6], nonnegative matrix factorization (NMF) [7] and principal component analysis (PCA) [8]. Because of the high dimensional nature of gene expression data, the low-rank representation (LRR) method has become a popular and promising method since its prototype was proposed by Liu et al [9]. The LRR method can preserve the subspace structure of the raw dataset in a lowest rank representation matrix. The LRR clustering method has been adopted widely in many fields due to the advantages of the lowest rank representation matrix, such as facial recognition [11], genetic microarray data clustering [12], image clustering [13] and subspace segmentation [14]. LRR method achieves good results in processing high-dimensional datasets

Objectives
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.