JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing

Tao Cui,Tingting Wang

doi:10.1186/s12864-020-07302-6

Abstract

BackgroundSingle-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it challenging to achieve reliable identification of cell-types and Differentially Expressed Genes (DEG). Moreover, with the explosive growth of single-cell data using 10x genomics protocol, existing methods will soon reach the computation limit due to scalability issues. The single-cell transcriptomics field desperately need new tools and framework to facilitate large-scale single-cell analysis.ResultsIn order to improve the accuracy, robustness, and speed of scRNA-Seq data processing, we propose a generalized zero-inflated negative binomial mixture model, “JOINT,” that can perform probability-based cell-type discovery and DEG analysis simultaneously without the need for imputation. JOINT performs soft-clustering for cell-type identification by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. This is drastically different from existing hard-clustering methods where each cell can only belong to one cell type. The soft-clustering component of the algorithm significantly facilitates the accuracy and robustness of single-cell analysis, especially when the scRNA-Seq datasets are noisy and contain a large number of dropout events. Moreover, JOINT is able to determine the optimal number of cell-types automatically rather than specifying it empirically. The proposed model is an unsupervised learning problem which is solved by using the Expectation and Maximization (EM) algorithm. The EM algorithm is implemented using the TensorFlow deep learning framework, dramatically accelerating the speed for data analysis through parallel GPU computing.ConclusionsTaken together, the JOINT algorithm is accurate and efficient for large-scale scRNA-Seq data analysis via parallel computing. The Python package that we have developed can be readily applied to aid future advances in parallel computing-based single-cell algorithms and research in various biological and biomedical fields.

Highlights

Single-cell RNA-Sequencing has provided single-cell level insights into complex biological processes
We propose a generalized zero-inflated negative binomial mixture model, “JOINT,” that can perform probability-based cell-type discovery and Differentially Expressed Genes (DEG) analysis simultaneously without the need for imputation
We comprehensively evaluated the impact of dropout probability and tested the performance of JOINT on cell-clustering and DEG analysis using simulated and real scRNA-Seq datasets

Summary

Introduction

Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. Seq technology has significantly advanced the understanding of human disease and underlying biological processes at the single-cell level [1, 2]. This ever-evolving technique has revealed cell lineage [3], cell-type heterogeneities [4, 5], and distinct patterns of gene expression [6] that cannot be identified by conventional bulk cell analysis. The massive size of scRNA-Seq datasets demands extensive processing time, hindering the applicability of imputation methods to ever-growing collections of scRNA-Seq data [14] Together, these challenges significantly hinder the progress of scRNASeq in its use as a technique and its application to biological and biomedical research

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 11, 2021
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

A Group of Deterioration Curves of Bridges Estimated by Extended EM Algorithm
I Yoshida ... Y Otake
-
I Yoshida, et. al.I Yoshida ... Y Otake
27 Jun 2014
27 Jun 2014

Detecting Fear-Memory-Related Genes from Neuronal scRNA-seq Data by Diverse Distributions and Bhattacharyya Distance.
Shaoqiang Zhang ... Yaxuan Cui
Biomolecules | VOL. 12
Shaoqiang Zhang, et. al.Shaoqiang Zhang ... Yaxuan Cui
17 Aug 2022
Biomolecules | VOL. 12

Maximum likelihood estimates, from censored data, for mixed-Weibull distributions
S Jiang ... D Kececioglu
IEEE Transactions on Reliability | VOL. 41
S Jiang, et. al.S Jiang ... D Kececioglu
01 Jun 1992
IEEE Transactions on Reliability | VOL. 41

Comprehensive analysis of scRNA-Seq and bulk RNA-Seq reveals dynamic changes in the tumor immune microenvironment of bladder cancer and establishes a prognostic model
Zhiyong Tan ... Haifeng Wang
Journal of Translational Medicine | VOL. 21
Zhiyong Tan, et. al.Zhiyong Tan ... Haifeng Wang
27 Mar 2023
Journal of Translational Medicine | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics