Abstract

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges. We present DESC, an unsupervised deep embedding algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function. Through iterative self-learning, DESC gradually removes batch effects, as long as technical differences across batches are smaller than true biological variations. As a soft clustering algorithm, cluster assignment probabilities from DESC are biologically interpretable and can reveal both discrete and pseudotemporal structure of cells. Comprehensive evaluations show that DESC offers a proper balance of clustering accuracy and stability, has a small footprint on memory, does not explicitly require batch information for batch effect removal, and can utilize GPU when available. As the scale of single-cell studies continues to grow, we believe DESC will offer a valuable tool for biomedical researchers to disentangle complex cellular heterogeneity.

Highlights

  • Single-cell RNA sequencing can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges

  • Through comprehensive analyses of datasets with various degrees of complexities, we show that DESC is able to remove complex batch effect, preserve biological variations, and can reveal both discrete and pseudotemporal structure of cells

  • Using a deep neural network, DESC initializes parameters obtained from an autoencoder and learns a nonlinear mapping function from the original scRNA-seq data space to a lowdimensional feature space by iteratively optimizing a clustering objective function

Read more

Summary

Introduction

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges. Large scRNA-seq datasets often include cells that are easy to cluster, and it is desirable to learn expression patterns from these cells because they provide valuable information on cluster-specific gene expression signatures. These cells can further help improve clustering of cells that are hard to cluster. As the number of cells grows in scRNA-seq studies, another major challenge in analysis is batch effect, which is systematic gene expression difference from one batch to another[2]. Haghverdi et al.[4] found that consideration of cell-type-specific batch effects rather than a globally constant batch effect for all cells leads to improved batch effect removal

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call