Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

Xiaohong Li,Timothy E O’Toole,Eric C Rouchka,Nigel G F Cooper

doi:10.1186/s12864-020-6502-7

Xiaohong Li, Timothy E O’Toole + Show 2 more

Open Access

https://doi.org/10.1186/s12864-020-6502-7

Copy DOI

Journal: BMC Genomics	Publication Date: Jan 28, 2020
Citations: 33	License type: open-access

Affiliation: University of Louisville

Abstract

BackgroundHigh-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths.ResultsUsing the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size.ConclusionWe found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.

Highlights

High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology
We found that UQ-pgQ2 normalization combined with an exact test from edegR performed slightly better than Trimmed-mean M (TMM) and Relative log estimate (RLE) in terms of false discovery rate (FDR) when using Microarray quality control project (MAQC) data and simulated data
Statistical analysis of MAQC2 and MAQC3 for the combined methods In our previous study, we evaluated the effect of normalization methods including DESeq, TMM, UQpgQ2 and UQ based on differentially expressed genes (DEGs) analysis using two MAQC datasets and an exact test/edgeR

Summary

Introduction

High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths. High-through RNA sequencing (RNA-seq) has been increasingly used in the studies of genomics and transcriptomics over the last decade [1, 2]. Recent clinical studies demonstrated the utility of RNA-seq in identifying complex disease signatures via transcriptome analysis [8, 9]. Despite this utility and importance, optimal methods for analyzing RNA-seq data remain uncertain. Normalization and proper test statistics are critical steps in the analysis of RNA-seq data [15]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments
Elie Maza ... Mohamed Zouine
Communicative & Integrative Biology | VOL. 6
Elie Maza, et. al.Elie Maza ... Mohamed Zouine
09 Nov 2013
Communicative & Integrative Biology | VOL. 6

GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
Liangqun Lu ... Bernie J Daigle
BMC Bioinformatics | VOL. 22
Liangqun Lu, et. al.Liangqun Lu ... Bernie J Daigle
03 Feb 2021
BMC Bioinformatics | VOL. 22

Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing

-

01 Jan 2019
01 Jan 2019

Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma
Chao Wu ... Jun Zhu
BMC Bioinformatics | VOL. 14
Chao Wu, et. al.Chao Wu ... Jun Zhu
01 Dec 2013
BMC Bioinformatics | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics