Abstract

BackgroundThe advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-seq where differential expression analysis is the integral component, the most important goal of scRNA-seq is to identify highly variable genes across a population of cells, to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. However, there is lack of generic expression variation model for different scRNA-seq data sets. Hence, the objective of this study is to develop a gene expression variation model (GEVM), utilizing the relationship between coefficient of variation (CV) and average expression level to address the over-dispersion of single-cell data, and its corresponding statistical significance to quantify the variably expressed genes (VEGs).ResultsWe have built a simulation framework that generated scRNA-seq data with different number of cells, model parameters, and variation levels. We implemented our GEVM and demonstrated the robustness by using a set of simulated scRNA-seq data under different conditions. We evaluated the regression robustness using root-mean-square error (RMSE) and assessed the parameter estimation process by varying initial model parameters that deviated from homogeneous cell population. We also applied the GEVM on real scRNA-seq data to test the performance under distinct cases.ConclusionsIn this paper, we proposed a gene expression variation model that can be used to determine significant variably expressed genes. Applying the model to the simulated single-cell data, we observed robust parameter estimation under different conditions with minimal root mean square errors. We also examined the model on two distinct scRNA-seq data sets using different single-cell protocols and determined the VEGs. Obtaining VEGs allowed us to observe possible subpopulations, providing further evidences of cell heterogeneity. With the GEVM, we can easily find out significant variably expressed genes in different scRNA-seq data sets.

Highlights

  • The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing

  • Single-cell quantitative PCR [2,3,4] or single-molecule RNA fluorescence in situ hybridization (FISH) [5] have been widely used as low-throughput approaches to measure the expression of specific genes at a single-cell level

  • Bulk RNA sequencing (RNA-seq) experiments start with a large population of cells (> 105), and the gene expression levels are considered as the average expression across the population of a cell pool [6]

Read more

Summary

Introduction

The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Single-cell quantitative PCR (qPCR) [2,3,4] or single-molecule RNA fluorescence in situ hybridization (FISH) [5] have been widely used as low-throughput approaches to measure the expression of specific genes at a single-cell level. Experiments using these methods can provide crucial information of cellular heterogeneity and the presence of distinct cell subpopulations, only a small number of genes can be monitored simultaneously. With recent introduction of Smart-seq protocol, the required volume of starting materials has been vastly reduced, making the single-cell RNA sequencing (scRNA-seq) achievable [7, 8]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call