Joint Low Rank Representation with Symmetric Orthogonal Decomposition for Clustering of scRNA-seq Data.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Single-cell RNA transcriptome data offer a fantastic chance to investigate biological mechanisms such as cellular heterogeneity. Accurate identification of subtypes is of great importance for revealing the molecular mechanisms underlying complex diseases. Designing computational methods for cell type identification has been a hot topic recently, and various computational algorithms have been designed to estimate cell type composition. However, owing to the high sparseness, noise, and dimensionality of the obtainable scRNA-seq data, boosting prediction performance remains a challenge. In this work, a new cell type identification method is developed by integrating low rank representation (LRR) and symmetric orthogonal decomposition, named LRRS. Different from the spectral embedding algorithm in which the number of clusters is predefined, LRRS introduces a new orthogonal symmetric decomposition strategy and adaptively characterizes the local properties by measuring the weighted distance under the orthogonal space. To optimize the graph model, an efficient iterative approach is proposed to optimize each variable alternatively utilizing the alternating direction method of multipliers (ADMM). Based on the resulting similarity matrix, the spectral algorithm is adopted to group the individual cells. To evaluate the performance of LRRS, we implemented it on the eleven benchmark datasets and compared it with fourteen other cutting-edge methods in terms of prediction accuracy and normalized mutual information. The comparison results show that LRRS is effective in predicting cell type composition.

Similar Papers
  • Peer Review Report
  • 10.7554/elife.75624.sa1
Decision letter: The single-cell chromatin accessibility landscape in mouse perinatal testis development
  • Jan 31, 2022
  • Deborah Bourc'his + 1 more

Decision letter: The single-cell chromatin accessibility landscape in mouse perinatal testis development

  • Research Article
  • Cite Count Icon 27
  • 10.1109/jbhi.2020.2991172
SCCLRR: A Robust Computational Method for Accurate Clustering Single Cell RNA-Seq Data.
  • Jan 1, 2021
  • IEEE Journal of Biomedical and Health Informatics
  • Wei Zhang + 2 more

Single-cell RNA transcriptome data present a tremendous opportunity for studying the cellular heterogeneity. Identifying subpopulations based on scRNA-seq data is a hot topic in recent years, although many researchers have been focused on designing elegant computational methods for identifying new cell types; however, the performance of these methods is still unsatisfactory due to the high dimensionality, sparsity and noise of scRNA-seq data. In this study, we propose a new cell type detection method by learning a robust and accurate similarity matrix, named SCCLRR. The method simultaneously captures both global and local intrinsic properties of data based on a low rank representation (LRR) framework mathematical model. The integrated normalized Euclidean distance and cosine similarity are used to balance the intrinsic linear and nonlinear manifold of data in the local regularization term. To solve the non-convex optimization model, we present an iterative optimization procedure using the alternating direction method of multipliers (ADMM) algorithm. We evaluate the performance of the SCCLRR method on nine real scRNA-seq datasets and compare it with seven state-of-the-art methods. The simulation results show that the SCCLRR outperforms other methods and is robust and effective for clustering scRNA-seq data. (The code of SCCLRR is free available for academic https://github.com/wzhangwhu/SCCLRR).

  • Research Article
  • Cite Count Icon 4
  • 10.1093/bib/bbae188
ScBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data.
  • Mar 27, 2024
  • Briefings in Bioinformatics
  • Yuyao Zhai + 2 more

Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tcbb.2022.3173587
Semi-Supervised Deep Learning for Cell Type Identification From Single-Cell Transcriptomic Data.
  • Mar 1, 2023
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • Xishuang Dong + 4 more

Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis. Deep neural networks have been employed to identify cell types from scRNAseq data with high performance. However, it requires a large mount of individual cells with accurate and unbiased annotated types to train the identification models. Unfortunately, labeling the scRNAseq data is cumbersome and time-consuming as it involves manual inspection of marker genes. To overcome this challenge, we propose a semi-supervised learning model "SemiRNet" to use unlabeled scRNAseq cells and a limited amount of labeled scRNAseq cells to implement cell identification. The proposed model is based on recurrent convolutional neural networks (RCNN), which includes a shared network, a supervised network and an unsupervised network. The proposed model is evaluated on two large scale single-cell transcriptomic datasets. It is observed that the proposed model is able to achieve encouraging performance by learning on the very limited amount of labeled scRNAseq cells together with a large number of unlabeled scRNAseq cells.

  • Research Article
  • Cite Count Icon 32
  • 10.1016/j.cels.2021.04.010
Single-cell co-expression analysis reveals that transcriptional modules are shared across cell types in the brain.
  • May 10, 2021
  • Cell Systems
  • Benjamin D Harris + 3 more

Single-cell co-expression analysis reveals that transcriptional modules are shared across cell types in the brain.

  • Research Article
  • Cite Count Icon 2
  • 10.1186/s12859-020-03547-w
A flexible network-based imputing-and-fusing approach towards the identification of cell types from single-cell RNA-seq data
  • Jun 11, 2020
  • BMC Bioinformatics
  • Yang Qi + 3 more

BackgroundSingle-cell RNA sequencing (scRNA-seq) provides an effective tool to investigate the transcriptomic characteristics at the single-cell resolution. Due to the low amounts of transcripts in single cells and the technical biases in experiments, the raw scRNA-seq data usually includes large noise and makes the downstream analyses complicated. Although many methods have been proposed to impute the noisy scRNA-seq data in recent years, few of them take into account the prior associations across genes in imputation and integrate multiple types of imputation data to identify cell types.ResultsWe present a new framework, NetImpute, towards the identification of cell types from scRNA-seq data by integrating multiple types of biological networks. We employ a statistic method to detect the noise data items in scRNA-seq data and develop a new imputation model to estimate the real values of data noise by integrating the PPI network and gene pathways. Meanwhile, based on the data imputed by multiple types of biological networks, we propose an integrated approach to identify cell types from scRNA-seq data. Comprehensive experiments demonstrate that the proposed network-based imputation model can estimate the real values of noise data items accurately and integrating the imputation data based on multiple types of biological networks can improve the identification of cell types from scRNA-seq data.ConclusionsIncorporating the prior gene associations in biological networks can potentially help to improve the imputation of noisy scRNA-seq data and integrating multiple types of network-based imputation data can enhance the identification of cell types. The proposed NetImpute provides an open framework for incorporating multiple types of biological network data to identify cell types from scRNA-seq data.

  • Research Article
  • 10.1089/cmb.2023.0077
ARGLRR: A Sparse Low-Rank Representation Single-Cell RNA-Sequencing Data Clustering Method Combined with a New Graph Regularization.
  • Aug 1, 2023
  • Journal of Computational Biology
  • Zhen-Chang Wang + 5 more

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.

  • Research Article
  • Cite Count Icon 4
  • 10.1080/10255842.2023.2188106
Joint L2,p-norm and random walk graph constrained PCA for single-cell RNA-seq data
  • Mar 6, 2023
  • Computer Methods in Biomechanics and Biomedical Engineering
  • Tai-Ge Wang + 5 more

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers’ understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint -norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data’s local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the -norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.

  • Research Article
  • Cite Count Icon 1
  • 10.1101/2024.07.15.603649
SmartImpute: A Targeted Imputation Framework for Single-cell Transcriptome Data
  • Jul 18, 2024
  • bioRxiv
  • Sijie Yao + 2 more

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and tissue transcriptomic complexity. However, the high frequency of dropout events in scRNA-seq data complicates downstream analyses such as cell type identification and trajectory inference. Existing imputation methods address the dropout problem but face limitations such as high computational cost and risk of over-imputation. We present SmartImpute, a novel computational framework designed for targeted imputation of scRNA-seq data. SmartImpute focuses on a predefined set of marker genes, enhancing the biological relevance and computational efficiency of the imputation process while minimizing the risk of model misspecification. Utilizing a modified Generative Adversarial Imputation Network architecture, SmartImpute accurately imputes the missing gene expression and distinguishes between true biological zeros and missing values, preventing overfitting and preserving biologically relevant zeros. To ensure reproducibility, we also provide a function based on the GPT4 model to create target gene panels depending on the tissue types and research context. Our results, based on scRNA-seq data from head and neck squamous cell carcinoma and human bone marrow, demonstrate that SmartImpute significantly enhances cell type annotation and clustering accuracy while reducing computational burden. Benchmarking against other imputation methods highlights SmartImpute’s superior performance in terms of both accuracy and efficiency. Overall, SmartImpute provides a lightweight, efficient, and biologically relevant solution for addressing dropout events in scRNA-seq data, facilitating deeper insights into cellular heterogeneity and disease progression. Furthermore, SmartImpute’s targeted approach can be extended to spatial omics data, which also contain many missing values.

  • Research Article
  • Cite Count Icon 24
  • 10.1109/jbhi.2021.3099127
NMFLRR: Clustering scRNA-Seq Data by Integrating Nonnegative Matrix Factorization With Low Rank Representation.
  • Mar 1, 2022
  • IEEE Journal of Biomedical and Health Informatics
  • Wei Zhang + 3 more

Fast-developing single-cell technologies create unprecedented opportunities to reveal cell heterogeneity and diversity. Accurate classification of single cells is a critical prerequisite for recovering the mechanisms of heterogeneity. However, the scRNA-seq profiles we obtained at present have high dimensionality, sparsity, and noise, which pose challenges for existing clustering methods in grouping cells that belong to the same subpopulation based on transcriptomic profiles. Although many computational methods have been proposed developing novel and effective computational methods to accurately identify cell types remains a considerable challenge. We present a new computational framework to identify cell types by integrating low-rank representation (LRR) and nonnegative matrix factorization (NMF); this framework is named NMFLRR. The LRR captures the global properties of original data by using nuclear norms, and a locality constrained graph regularization term is introduced to characterize the data's local geometric information. The similarity matrix and low-dimensional features of data can be simultaneously obtained by applying the alternating direction method of multipliers (ADMM) algorithm to handle each variable alternatively in an iterative way. We finally obtained the predicted cell types by using a spectral algorithm based on the optimized similarity matrix. Nine real scRNA-seq datasets were used to test the performance of NMFLRR and fifteen other competitive methods, and the accuracy and robustness of the simulation results suggest the NMFLRR is a promising algorithm for the classification of single cells. The simulation code is freely available at: https://github.com/wzhangwhu/NMFLRR_code.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1186/s12864-022-09027-0
Non-negative low-rank representation based on dictionary learning for single-cell RNA-sequencing data analysis
  • Dec 23, 2022
  • BMC Genomics
  • Juan Wang + 6 more

In the analysis of single-cell RNA-sequencing (scRNA-seq) data, how to effectively and accurately identify cell clusters from a large number of cell mixtures is still a challenge. Low-rank representation (LRR) method has achieved excellent results in subspace clustering. But in previous studies, most LRR-based methods usually choose the original data matrix as the dictionary. In addition, the methods based on LRR usually use spectral clustering algorithm to complete cell clustering. Therefore, there is a matching problem between the spectral clustering method and the affinity matrix, which is difficult to ensure the optimal effect of clustering. Considering the above two points, we propose the DLNLRR method to better identify the cell type. First, DLNLRR can update the dictionary during the optimization process instead of using the predefined fixed dictionary, so it can realize dictionary learning and LRR learning at the same time. Second, DLNLRR can realize subspace clustering without relying on spectral clustering algorithm, that is, we can perform clustering directly based on the low-rank matrix. Finally, we carry out a large number of experiments on real single-cell datasets and experimental results show that DLNLRR is superior to other scRNA-seq data analysis algorithms in cell type identification.

  • Research Article
  • Cite Count Icon 90
  • 10.1109/tcyb.2018.2811764
LRR for Subspace Segmentation via Tractable Schatten-$p$ Norm Minimization and Factorization.
  • Mar 14, 2018
  • IEEE Transactions on Cybernetics
  • Hengmin Zhang + 4 more

Recently, nuclear norm-based low rank representation (LRR) methods have been popular in several applications, such as subspace segmentation. However, there exist two limitations: one is that nuclear norm as the relaxation of rank function will lead to the suboptimal solution since nuclear norm-based minimization subproblem tends to the over-relaxations of singular value elements and treats each of them equally; the other is that solving LRR problems may cause more time consumption due to involving singular value decomposition of the large scale matrix at each iteration. To overcome both disadvantages, this paper mainly considers two tractable variants of LRR: one is Schatten-p norm minimization-based LRR (i.e., SpNM_LRR) and the other is Schatten-p norm factorization-based LRR (i.e., SpNFLRR) for p=1, 2/3 and 1/2. By introducing two or more auxiliary variables in the constraints, the alternating direction method of multiplier (ADMM) with multiple updating variables can be devised to solve these variants of LRR. Furthermore, both computational complexity and convergence property are given to evaluate nonconvex multiblocks ADMM algorithms. Several experiments finally validate the efficacy and efficiency of our methods on both synthetic data and real world data.

  • Peer Review Report
  • 10.7554/elife.70416.sa1
Decision letter: Single-cell RNA sequencing of the Strongylocentrotus purpuratus larva reveals the blueprint of major cell types and nervous system of a non-chordate deuterostome
  • Jul 6, 2021
  • Roger Revilla-I-Domingo + 2 more

Decision letter: Single-cell RNA sequencing of the Strongylocentrotus purpuratus larva reveals the blueprint of major cell types and nervous system of a non-chordate deuterostome

  • Research Article
  • 10.1093/nargab/lqae166
AnnoGCD: a generalized category discovery framework for automatic cell type annotation.
  • Sep 28, 2024
  • NAR genomics and bioinformatics
  • Francesco Ceccarelli + 2 more

The identification of cell types in single-cell RNA sequencing (scRNA-seq) data is a critical task in understanding complex biological systems. Traditional supervised machine learning methods rely on large, well-labeled datasets, which are often impractical to obtain in open-world scenarios due to budget constraints and incomplete information. To address these challenges, we propose a novel computational framework, named AnnoGCD, building on Generalized Category Discovery (GCD) and Anomaly Detection (AD) for automatic cell type annotation. Our semi-supervised method combines labeled and unlabeled data to accurately classify known cell types and to discover novel ones, even in imbalanced datasets. AnnoGCD includes a semi-supervised block to first classify known cell types, followed by an unsupervised block aimed at identifying and clustering novel cell types. We evaluated our approach on five human scRNA-seq datasets and a mouse model atlas, demonstrating superior performance in both known and novel cell type identification compared to existing methods. Our model also exhibited robustness in datasets with significant class imbalance. The results suggest that AnnoGCD is a powerful tool for the automatic annotation of cell types in scRNA-seq data, providing a scalable solution for biological research and clinical applications. Our code and the datasets used for evaluations are publicly available on GitHub: https://github.com/cecca46/AnnoGCD/.

  • Research Article
  • Cite Count Icon 47
  • 10.1016/j.neucom.2017.11.052
Robust GBM hyperspectral image unmixing with superpixel segmentation based low rank and sparse representation
  • Dec 5, 2017
  • Neurocomputing
  • Xiaoguang Mei + 5 more

Robust GBM hyperspectral image unmixing with superpixel segmentation based low rank and sparse representation

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon