Abstract

SummaryThe massive size of single-cell RNA sequencing datasets often exceeds the capability of current computational analysis methods to solve routine tasks such as detection of cell types. Recently, geometric sketching was introduced as an alternative to uniform subsampling. It selects a subset of cells (the sketch) that evenly cover the transcriptomic space occupied by the original dataset, to accelerate downstream analyses and highlight rare cell types. Here, we propose algorithm Sphetcher that makes use of the thresholding technique to efficiently pick representative cells within spheres (as opposed to the typically used equal-sized boxes) that cover the entire transcriptomic space. We show that the spherical sketch computed by Sphetcher constitutes a more accurate representation of the original transcriptomic landscape. Our optimization scheme allows to include fairness aspects that can encode prior biological or experimental knowledge. We show how a fair sampling can inform the inference of the trajectory of human skeletal muscle myoblast differentiation.

Highlights

  • Single-cell RNA sequencing has emerged as a revolutionary tool that can shed light on many corners of cell biology that were unaccessible to previous approaches

  • SUMMARY The massive size of single-cell RNA sequencing datasets often exceeds the capability of current computational analysis methods to solve routine tasks such as detection of cell types

  • It selects a subset of cells that evenly cover the transcriptomic space occupied by the original dataset, to accelerate downstream analyses and highlight rare cell types

Read more

Summary

Introduction

Single-cell RNA sequencing (scRNA-seq) has emerged as a revolutionary tool that can shed light on many corners of cell biology that were unaccessible to previous approaches. Droplet-based technologies allow to profile the expression of every gene in the genome for hundreds of thousands of cells at once, and even experiments profiling the transcriptome of millions of cells have become increasingly common (Cao et al, 2019). Experiments performed in Hie et al (2019), demonstrated that these data-dependent methods do not scale efficiently to large datasets and provide unbalanced samples that hamper downstream analyses. Hie et al (2019) introduced geometric sketching as an alternative approach that efficiently samples cells evenly across gene expression space rather than proportional to the abundance of cells that are in a similar state. Hie et al (2019) approximate the transcriptomic space of single cells by equalsized boxes rather than spheres, from within which cells are randomly selected as representatives into the sketch

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call