Abstract
One advantage of single-cell RNA sequencing is its ability in revealing cell heterogeneity by cell clustering. However, cell clustering based on single-cell RNA sequencing is challenging due to the high transcript amplification noise, sparsity and outlier cell populations. In this study, we propose a novel sparse subspace clustering method called Structured Sparse Subspace Clustering and Completion for single-cell RNA sequencing analysis by assuming the cells related together are in the same subspace, and so the relationships among cells can be described within a subspace instead of between cell pairs. The proposed optimization model is solved by the Linearized Alternating Direction Method of Multipliers, in which data completion and spectral clustering are combined as a whole by mutual constraint. It is worth noting that random walk is used in the model to make the coefficient matrix more diagonal in the optimum iterative procedure, and the effect is significant. Our model is applied and compared with 5 state-of-the-art clustering methods on 6 public single cell datasets and a simulated data set with cell numbers varying from 56 to over 3000. As a result, our model outperforms the other clustering methods in clustering accuracy as evaluated by Adjusted Rand Index, Normalized Mutual Information, Homogeneity and Completeness, especially compared with the other improved sparse subspace clustering methods.
Highlights
IntroductionOne important problem in scRNA-seq analysis is cell clustering, which could be modelled as an unsupervised problem in machine learning
Sparse Subspace Clustering (SSC) algorithm aims to describe the relations among all elements as a combination in the same subspace rather than consider pair elements only [4], in which the coefficient matrix is established by solving the sparse representation of data with regularization [5], and the spectral clustering is applied on the coefficient matrix
Selected scRNA-seq gene expression matrix is processed as input of S3C2 and we use random walk to enhance the diagonalization of the coefficient matrix
Summary
One important problem in scRNA-seq analysis is cell clustering, which could be modelled as an unsupervised problem in machine learning. Various methods for clustering scRNA-seq data have been developed in recent years, many. Most of the established clustering methods derived similarity matrices between pair of cells, which resulted in deficiency in capturing the related information among cells since the cells in a tissue work biologically together. Sparse Subspace Clustering (SSC) algorithm aims to describe the relations among all elements as a combination in the same subspace rather than consider pair elements only [4], in which the coefficient matrix is established by solving the sparse representation of data with regularization [5], and the spectral clustering is applied on the coefficient matrix.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.