Abstract

BackgroundIn gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets.ResultThe data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined.ConclusionFor high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power.

Highlights

  • In gene expression studies, Ribonucleic acid (RNA) sample pooling is sometimes considered because of budget constraints or lack of sufficient input material

  • We have shown that the utility of an RNA sample pooling strategy depends on the choice of the pooling parameters, such as the pool size and the number of RNA samples

  • Since the cost of RNA sample preparation is relatively low, one may consider using as many RNA samples as possible to capture the heterogeneity of the population under study, and using an adequate pooling strategy, one can substantially reduce the cost of the subsequent steps, which are considerably more expensive, and maintain the power of a differential gene expression (DGE) test

Read more

Summary

Introduction

RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. We comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. Parallel sequencing of cDNA libraries (RNAseq), is the gold standard for comprehensive profiling of RNA expression [1]. This type of data is used to answer various biological and medical questions, including discovering deferentially expressed (DE) genes between experimental or biological conditions. Statistical tools for testing differential gene expression (DGE) were designed to make efficient use of that type of data. It is critical to decide whether to increase the sequencing depth to have more accurate measurements of gene expression levels (especially for low abundant genes) or to increase the number of biological samples with lower average sequencing depth [3, 8]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call