A crucial step prior to analysis of single cell transcriptomic data includes filtering out low-quality (e.g., dying) cells and artifacts (e.g., empty droplets). Choosing appropriate quality control parameters leads to correct cell assignment, so that the data for downstream analysis is high-quality and is representative of all processed cell types. These metrics include high and low cut-offs for mitochondrial content, total number of genes, number of different genes, and ribosomal content per cell. They have been found to vary based on cell and tissue type, as well biological variables such as age and disease, and technical variables such as sequencing technology. The aim of this study was to investigate the variation of these quality control metrics in a data set that included stem, progenitor, and differentiated cell types of the hematopoietic, mesenchymal, and endothelial lineages. Furthermore, we set out to determine whether applying standard cut-offs may differentially affect different cell types. We calculated individual filtering parameters for each cell type using the R package ddqc (Subramanian et al. Genome Biol. 2022). Thus, any biological variability based on mitochondrial, ribosomal, and gene counts was accounted for. Notably, distribution of measurements for each of the above quality control metrics varied by cluster, and certain cell-type means were adjacent to commonly used cut-off values. For instance, we observed higher content of mitochondrial genes in osteoblasts, sinusoidal endothelial cells, a subset of adipocyte-primed mesenchymal cells (MSPC), as well as hematopoietic progenitor cells, pre-B, and B cells. On the other hand, subsets of erythroblasts, neutrophils, as well as arteriolar endothelial cells had the lowest mitochondrial content. In addition, lower total gene counts and lower gene complexity were found in adipocyte-primed MSPCs, neutrophils, and megakaryocyte-erythrocyte progenitors, while higher counts were found in granulocyte-monocyte progenitor cells, megakaryocyte/erythroid-primed multipotent progenitor cells and subsets of erythroblasts. Applying universal cutoff values to this data set would greatly reduce, and some cases, eliminate cell types that have high mitochondrial content or lower gene counts and complexity, while leaving those cell types on the opposite spectrum unaffected. This would also skew proportions of cells in downstream analyses, thus affecting biological interpretation of the results. Overall, our results demonstrate biological variability of the quality control metric distributions in a single cell transcriptomic data set that includes diverse bone marrow niche and hematopoietic cell types. This reveals novel biological information regarding distinct cell types as well as cautions that applying standard filtering paraments on such a data set would eliminate some cell types and bias downstream analysis for others. Optimization of filtering criteria will be important when broadening single cell analysis to multiplexed approaches incorporating transcriptome, genome, epigenome, and proteome-level data.
Read full abstract