Abstract

Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.

Highlights

  • Biological tissues are composed of a variety of heterogeneous cells, and their presence will have a profound impact on the biological functions of cells

  • Most of the scRNA-seq cell clustering methods derive the similarity between cell pairs by considering the complete gene expression matrix, which ignore the function of genes on cell clustering from the perspective of molecular mechanism and the impact of biological significance

  • We propose a feature extraction method based on gene functional sets, named FEGFS, to analyze and integrate the gene expression characteristics of cells on different functional gene sets derived from Gene Ontology (GO) terms (Figure 1)

Read more

Summary

Introduction

Biological tissues are composed of a variety of heterogeneous cells, and their presence will have a profound impact on the biological functions of cells. The single-cell RNA sequencing (scRNA-seq) technology [1] allows for the analysis of gene expression data at the level of individual cells. Despite the rapid development of scRNA-seq technology, the biological fluctuation and protocol technical biases in single-cell experiments and the high dimensionality and sparsity of scRNA-seq data make cell clustering based on scRNA-seq challenging [6]. Various scRNA-seq clustering methods have been developed in recent years, most of which are based on similarity measurement between cells. Most of the scRNA-seq cell clustering methods derive the similarity between cell pairs by considering the complete gene expression matrix, which ignore the function of genes on cell clustering from the perspective of molecular mechanism and the impact of biological significance. Since the differences in the morphology and structure of different cells are caused by the selective expression of genes, it is more reasonable to analyze scRNA-seq data in terms of functional gene sets

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call