Abstract
Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell–based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells.
Highlights
Traditional gene expression profiling with high-throughput RNA-sequencing technology measures the aggregated expression levels of genes from a collection of millions of cells
EdgeR, DESeq2 and limmatrend are designed for bulk-cell RNA-seq analysis; and DEsingle, monocle, BPSC, and MAST are developed based on scRNA-seq data
The false positive rate (FPR) of DEsingle has a slight variation, as it is liberal for MDAMB-231 data set, conservative for Neuronal progenitor cells (NPCs) data set, and performs rather well in the other data sets
Summary
Traditional gene expression profiling with high-throughput RNA-sequencing technology measures the aggregated expression levels of genes from a collection of millions of cells Such bulk-cell RNAsequencing cannot capture cellular heterogeneity since there is no cell-specific information (Miao and Zhang, 2016; Jaakkola et al, 2017). Reproducibility of DE Methods for scRNA-seq Data biological and technical reasons, for example, lower input materials, cell-cycle phase, amplification biases, and the socalled dropout and bursting events (Dal Molin et al, 2017; Jaakkola et al, 2017; Soneson and Robinson, 2018) Such events are caused by the stochastic nature of the gene expression process at the single-cell level (Gong et al, 2018). Aggregation of expression in bulk-cell data reduces the effects of these single-cell events
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.