Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size.

Sophie Lamarre,Guojian Hu,Delphine Labourdette,Elise Sainderichin,Véronique Le Berre-Anton,Mondher Bouzayen,Mohamed Zouine,Pierre Frasse,Elie Maza

doi:10.3389/fpls.2018.00108

Abstract

RNA-Seq is a widely used technology that allows an efficient genome-wide quantification of gene expressions for, for example, differential expression (DE) analysis. After a brief review of the main issues, methods and tools related to the DE analysis of RNA-Seq data, this article focuses on the impact of both the replicate number and library size in such analyses. While the main drawback of previous relevant studies is the lack of generality, we conducted both an analysis of a two-condition experiment (with eight biological replicates per condition) to compare the results with previous benchmark studies, and a meta-analysis of 17 experiments with up to 18 biological conditions, eight biological replicates and 100 million (M) reads per sample. As a global trend, we concluded that the replicate number has a larger impact than the library size on the power of the DE analysis, except for low-expressed genes, for which both parameters seem to have the same impact. Our study also provides new insights for practitioners aiming to enhance their experimental designs. For instance, by analyzing both the sensitivity and specificity of the DE analysis, we showed that the optimal threshold to control the false discovery rate (FDR) is approximately 2−r, where r is the replicate number. Furthermore, we showed that the false positive rate (FPR) is rather well controlled by all three studied R packages: DESeq, DESeq2, and edgeR. We also analyzed the impact of both the replicate number and library size on gene ontology (GO) enrichment analysis. Interestingly, we concluded that increases in the replicate number and library size tend to enhance the sensitivity and specificity, respectively, of the GO analysis. Finally, we recommend to RNA-Seq practitioners the production of a pilot data set to strictly analyze the power of their experimental design, or the use of a public data set, which should be similar to the data set they will obtain. For individuals working on tomato research, on the basis of the meta-analysis, we recommend at least four biological replicates per condition and 20 M reads per sample to be almost sure of obtaining about 1000 DE genes if they exist.

Highlights

Since its first results were published, RNA-Seq technology has been widely perceived as a revolutionary tool for transcriptomics (Wang Z. et al, 2009)
We aim to study the impact of the replicate number and library size on the differential expression (DE) analysis of an RNA-Seq experiment involving the tomato fruit model (Solanum lycopersicum)
When increasing the replicate number from 2 to 7, the number of enriched biological process (BP) categories was almost tripled. These results suggest that the enrichment stability of the BP categories depends more on the biological replicate number than on the library size

Summary

Introduction

Since its first results were published, RNA-Seq technology has been widely perceived as a revolutionary tool for transcriptomics (Wang Z. et al, 2009). As for any other statistical analysis, one main issue has been finding the probabilistic model that best fits the data, as well as the optimal parameter estimates of this model. Another important issue was the need for normalization of the data to correctly compare two different biological conditions by assessing and erasing all eventual technical and/or biological biases. The practical need to find the optimal number of biological replicates per condition and the optimal library size have been highlighted in many studies We introduce these issues and review some widely used methods and tools for DE analysis. This review will help us to choose the most relevant methods and tools to perform DE analyses in the present work

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Plant Science	Publication Date: Feb 14, 2018
Citations: 72	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Plant Science

Lead the way for us

Similar Papers

Editor's evaluation: Comparative transcriptomic analysis reveals translationally relevant processes in mouse models of malaria
Urszula Krzych
-
Urszula KrzychUrszula Krzych
11 Aug 2021
11 Aug 2021

Normalization and Statistical Analysis of Quantitative Proteomics Data Generated by Metabolic Labeling
Lily Ting ... Ricardo Cavicchioli
Molecular & cellular proteomics : MCP | VOL. 8
Lily Ting, et. al.Lily Ting ... Ricardo Cavicchioli
01 Oct 2009
Molecular & cellular proteomics : MCP | VOL. 8

Experimental and Statistical Considerations to Avoid False Conclusions in Proteomics Studies Using Differential In-gel Electrophoresis
Natasha A Karp ... Kathryn S Lilley
Molecular & cellular proteomics : MCP | VOL. 6
Natasha A Karp, et. al.Natasha A Karp ... Kathryn S Lilley
01 Aug 2007
Molecular & cellular proteomics : MCP | VOL. 6

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing
José A Robles ... Conrad J Burden
BMC Genomics | VOL. 13
José A Robles, et. al.José A Robles ... Conrad J Burden
17 Sep 2012
BMC Genomics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Plant Science