GeneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq

Alsu Missarova,Tim Stuart,Mark Atkinson,Maigan Brusko,Todd Brusko,John C Marioni,Jaison Jain,Andrew Butler,Clive Wasserfall,Harry Nick,Rahul Satija,Shila Ghazanfar

doi:10.1186/s13059-021-02548-z

Abstract

scRNA-seq datasets are increasingly used to identify gene panels that can be probed using alternative technologies, such as spatial transcriptomics, where choosing the best subset of genes is vital. Existing methods are limited by a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cells. We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. Our approach outperforms existing strategies and can resolve cell types and subtle cell state differences.

Highlights

Single-cell RNA sequencing is a fundamental approach for studying transcriptional heterogeneity within individual tissues, organs, and organisms
More recent technological advances such as single-cell multi-omics assays, CRISPR screens, and spatial transcriptomics go beyond measuring only the transcriptome, facilitating a more complete understanding of the features that underpin cellular function. In many of these cases, for a large number of spatial transcriptomics assays, selecting the set of genes to probe is an important parameter, which in turn necessitates the emergence of appropriate computational tools
We have shown that geneBasis outperforms existing methods, both in terms of computational speed and in identifying relevant sets of genes and that geneBasis selects genes that characterize both local and global axes of variation that can be recovered from a k-nearest neighbor (k-NN) graph representation of transcriptional similarities. geneBasis allows user knowledge to be directly incorporated by selecting, a priori, a set of genes of particular biological relevance, which are augmented by the algorithm

Summary

Introduction

Single-cell RNA sequencing (scRNA-seq) is a fundamental approach for studying transcriptional heterogeneity within individual tissues, organs, and organisms (reviewed in [1]). A key step in the analysis of scRNA-seq data is the selection of a set of representative features, typically a subset of genes, that capture variability in the data and that can be used in downstream analysis. Established approaches for feature selection leverage quantitative per gene metrics that aim to identify genes that display more variability than expected by chance across the population of cells under study. Used methods for detecting highly variable genes (HVG) utilize the relationship between mean and standard deviation of expression levels (reviewed in [2]), GiniClust leverages Gini indices [3], and M3Drop performs dropout-based feature selection [4]. A recently developed approach, scPNMF, further addresses the gene complexity problem by leveraging a Non-Negative Matrix Factorization (NMF) representation of scRNA-seq, with selected features being suggested to represent interesting biological variability in the data [6]. A recently developed approach, scPNMF, further addresses the gene complexity problem by leveraging a Non-Negative Matrix Factorization (NMF) representation of scRNA-seq, with selected features being suggested to represent interesting biological variability in the data [6]. scPNMF relies on the chosen dimension for the NMF representation and does not Missarova et al Genome Biology (2021) 22:333 directly compare informativeness between different factors, impeding the ability to compare the importance (i.e., gene weights) between different factors

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: Dec 1, 2021
Citations: 18	License type: open-access

R Discovery Prime

R Discovery Prime

GeneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Decision letter: The single-cell chromatin accessibility landscape in mouse perinatal testis development
Deborah Bourc'his ... Marianne E Bronner
-
Deborah Bourc'his, et. al.Deborah Bourc'his ... Marianne E Bronner
31 Jan 2022
31 Jan 2022

Single-cell co-expression analysis reveals that transcriptional modules are shared across cell types in the brain.
Benjamin D Harris ... Jesse Gillis
Cell Systems | VOL. 12
Benjamin D Harris, et. al.Benjamin D Harris ... Jesse Gillis
10 May 2021
Cell Systems | VOL. 12

Molecular taxonomy of nociceptors and pruriceptors.
Jussi Kupari ... Patrik Ernfors
Pain | VOL. 164
Jussi Kupari, et. al.Jussi Kupari ... Patrik Ernfors
25 Jan 2023
Pain | VOL. 164

Abstract 5066: Integrating spatial and single cell transcriptomics to identify and characterise biologically driven subgroups in invasive lobular carcinoma
Matteo Serra ... Francois P Duhoux
Cancer Research | VOL. 84
Matteo Serra, et. al.Matteo Serra ... Francois P Duhoux
22 Mar 2024
Cancer Research | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GeneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology