Abstract

The high-throughput gene expression data generated from recent single-cell RNA sequencing (scRNA-seq) and parallel single-cell reverse transcription quantitative real-time PCR (scRT-qPCR) technologies enable biologists to study the function of transcriptome at the level of individual cells. Compared with bulk RNA-seq and RT-qPCR gene expression data, single-cell data show notable distinct features, including excessive zero expression values, high variability, and clustered design. We propose to model single-cell high-throughput gene expression data using a two-part mixed model, which not only adequately accounts for the aforementioned features of single-cell expression data but also provides the flexibility of adjusting for covariates. An efficient computational algorithm, automatic differentiation, is used for estimating the model parameters. Compared with existing methods, our approach shows improved power for detecting differential expressed genes in single-cell high-throughput gene expression data.

Highlights

  • Single-cell high-throughput gene expression profiling technologies, including single-cell RNA sequencing and parallel single-cell single-cell reverse transcription quantitative real-time PCR, have enabled researchers to examine mRNA expression at the resolution of individual cell level, which provide further biological insights of the transcriptomes and functional genomics [1–4]

  • A small constant c is added to the non-zero expression levels before taking logarithms to avoid the left skewness caused by taking logarithms on small-expression values between 0 and 1, which is often seen in RNA-seq data

  • Given the likelihood function written in the form of (4.2), ADMB calculates the Hessian matrix of the marginal likelihood function using the automatic differentiation technique, and the maximization of the marginal likelihood function is performed by first approximating the integrals using Laplace approximations and maximizing the approximated likelihood using the quasi-Newton algorithm

Read more

Summary

Introduction

Single-cell high-throughput gene expression profiling technologies, including single-cell RNA sequencing (scRNA-seq) and parallel single-cell single-cell reverse transcription quantitative real-time PCR (scRT-qPCR), have enabled researchers to examine mRNA expression at the resolution of individual cell level, which provide further biological insights of the transcriptomes and functional genomics [1–4]. Compared to bulk RNA-seq and RT-qPCR experiments that are usually performed on animal tissues (i.e., cell populations) and homogenous cell lines, single-cell high-throughput gene expression data generated by scRNA-seq and scRT-qPCR have the following distinct features as seen in recent literature [4–6]: Excessive zero expression values. To account for the abovementioned issues, we propose to model single-cell highthroughput gene expression data using a two-part mixed model This model adequately accounts for the above features of single-cell gene expression data and provides flexibility for adjusting for covariates in the study design. The details of this model and how it can be applied to differential expression analysis of single-cell data are discussed in the rest of this paper, which is organized as follows. We demonstrate our approach by applying it to two real-world single-cell high-throughput gene expression datasets: one from scRT-qPCR and the other from scRNA-seq

The Two-Part Mixed Model for Single-Cell Gene Expression Data
Model Fitting
Testing for Differential Expression
Evaluation of Type I Error Rates
Evaluation of Statistical Power
Application to an scRT-qPCR Dataset
Application to scRNA-seq Datasets
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call