Abstract
ABSTRACT:MotivationSingle-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data.ResultsHere, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data.Availability and implementationThe R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
The recent development of single-cell RNA sequencing technologies provides unprecedented opportunities to decipher transcriptome heterogeneity among individual cells (Birnbaum, 2018; Potter, 2018; Zhu et al, 2020)
We first compare the formulation of Projective Non-negative Matrix Factorization (PNMF) with that of principal component analysis (PCA) and negative matrix factorization (NMF), and we show that PNMF has the advantages of both PCA and NMF so that it can be a useful tool for scRNA-seq data analysis
We propose single-cell Projective Non-negative Matrix Factorization (scPNMF), an unsupervised gene selection and data projection method for scRNA-seq data
Summary
The recent development of single-cell RNA sequencing (scRNA-seq) technologies provides unprecedented opportunities to decipher transcriptome heterogeneity among individual cells (Birnbaum, 2018; Potter, 2018; Zhu et al, 2020). Besides scRNA-seq data analysis, informative gene selection is crucial for designing single-cell targeted gene profiling experiments, which we define to include all technologies that measure only a specific set of genes’ expression levels in individual cells. Compared with scRNA-seq, targeted gene profiling technologies have advantages such as capturing spatial information (by smFISH and MERFISH), having a lower cost per cell (by BART-Seq), and exhibiting a higher sensitivity for detecting lowly expressed genes (by HyPR-seq). It remains an open and challenging question to optimize the gene selection for targeted gene profiling under a gene number limitation. ScPNMF is a powerful gene selection method that can guide the experimental design and data analysis of single-cell targeted gene profiling
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have