Abstract

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.

Highlights

  • The bulk-cell RNA-sequencing (RNA-seq) technique measures the aggregated expression levels of thousand(s) of genes from tissue samples, i.e., a collection of thousand(s) of cells

  • We evaluated the performance of the 19 tested methods (Supplementary Document S5) for identifying genuine Differential Expression (DE) genes through individual performance metrics, such as True Positives (TP), False Positives (FP), True Negatives (TN), False Negatives (FN), True Positive Rate (TPR), False Positive Rate (FPR), False Discovery Rate (FDR), Positive Prediction Rate (PPV), Negative Prediction value (NPV), Accuracy (ACC), F1 score (F1), and AUROC (Equations (30)–(37)), and runtime criteria on 11 real datasets (Table 3)

  • Our preliminary analytical results indicated that the expected frequencies computed from the Zero Inflated NB (ZINB) model were much closer to their observed counterparts, followed by the NB model compared to others (Supplementary Tables S5 and S6, and Figure S2)

Read more

Summary

Introduction

The bulk-cell RNA-sequencing (RNA-seq) technique measures the aggregated expression levels of thousand(s) of genes from tissue samples, i.e., a collection of thousand(s) of cells. This technology cannot capture cell-cell heterogeneity since there is no cell-specific information available [1,2]. The single-cell RNA-sequencing (scRNA-seq) technique was developed for studying the expression dynamics of genes at the single-cell level [3]. The scRNA-seq data have unique features, such as low library sizes of cells, stochasticity of gene expression, high-level noises, low capturing of mRNA molecules, high dropouts, amplification bias, multi-modality, zero-inflation, etc. These biological and technical factors contribute higher proportions of zeros in the data, characterized as true and false/dropout zeros, respectively [7,8,9]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call