Abstract
SummaryAn increasing number of studies are using single-cell RNA-sequencing (scRNA-seq) to characterize the gene expression profiles of individual cells. One common analysis applied to scRNA-seq data involves detecting differentially expressed (DE) genes between cells in different biological groups. However, many experiments are designed such that the cells to be compared are processed in separate plates or chips, meaning that the groupings are confounded with systematic plate effects. This confounding aspect is frequently ignored in DE analyses of scRNA-seq data. In this article, we demonstrate that failing to consider plate effects in the statistical model results in loss of type I error control. A solution is proposed whereby counts are summed from all cells in each plate and the count sums for all plates are used in the DE analysis. This restores type I error control in the presence of plate effects without compromising detection power in simulated data. Summation is also robust to varying numbers and library sizes of cells on each plate. Similar results are observed in DE analyses of real data where the use of count sums instead of single-cell counts improves specificity and the ranking of relevant genes. This suggests that summation can assist in maintaining statistical rigour in DE analyses of scRNA-seq data with plate effects.
Highlights
Single-cell RNA sequencing is increasingly being used to study molecular biology at the cellular level
The count data can be analyzed to identify cell subtypes by clustering of the gene expression profiles; to identify highly variable genes contributing to cell-to-cell heterogeneity; and to identify differentially expressed (DE) genes between groups of cells
When the simulations are repeated without any plate effect, the observed rates for all methods are substantially closer to the specified level, if not below it. These results suggest that DE analyses will perform poorly if the plate effect is ignored
Summary
Single-cell RNA sequencing (scRNA-seq) is increasingly being used to study molecular biology at the cellular level. RNA is isolated from individual cells and reverse-transcribed into cDNA fragments that are sequenced using massively parallel technologies (Stegle and others, 2015). C. MARIONI reads to a reference genome allows quantification of gene expression in each cell based on the number of reads assigned to each gene. The ability to assay expression profiles for individual cells provides scRNA-seq studies with biological resolution that cannot be matched by bulk RNA-seq experiments on cell populations. This comes at the cost of high technical noise due to difficulties in sequencing low input quantities of RNA from single cells
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.