Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.

Farnoosh Abbas-Aghababazadeh,Qian Li,Brooke L Fridley

doi:10.1371/journal.pone.0206312

Farnoosh Abbas-Aghababazadeh, Qian Li + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0206312

Copy DOI

Journal: PLOS ONE	Publication Date: Oct 31, 2018
Citations: 60	License type: CC BY 4.0

Affiliation: Moffitt Cancer Center, University of South Florida

Abstract

Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA (“BE”) method outperforms the other methods (SVA “Leek”, PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data.

Highlights

Demand for revolutionary technologies to deliver fast, inexpensive and accurate information has accelerated the development of high throughput sequencing (HTS) technologies
None of the previous studies did the comprehensive comparison of the library size and across sample normalization methods, where the impact of loss of degrees of freedom due to normalization for downstream differential expression analysis was taken into account
It is important to keep in mind that Trimmed Mean of M-values (TMM) and Relative Log Expression (RLE) methods rely on strong assumptions that most genes are not differentially expressed (DE) [21,34]

Summary

Introduction

Demand for revolutionary technologies to deliver fast, inexpensive and accurate information has accelerated the development of high throughput sequencing (HTS) technologies. In the last five years, massively parallel RNA sequencing (RNA-Seq) has allowed for the large scale characterization of the transcriptomic landscape of cancer. Many methods have been developed that provide accurate measurements of transcript abundance [1,2], and improved transcription start site mapping [3], gene fusion detection [4], small RNA. Normalization of RNA-Seq data with this information contained within the original TCGA ID. These known factors can be downloaded from MBatch, a web-based analysis tool for normalization of TCGA data developed by MD Anderson The code use to simulate the data is included in S1 File

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Dynamical degrees of freedom and correlations in isometric finger force production
Eric G James
Experimental Brain Research | VOL. 223
Eric G JamesEric G James
02 Oct 2012
Experimental Brain Research | VOL. 223

Biological constraints simplify the recognition of hand shapes.
T.E Jerde ... M Flanders
IEEE transactions on bio-medical engineering | VOL. 50
T.E Jerde, et. al.T.E Jerde ... M Flanders
01 Feb 2003
IEEE transactions on bio-medical engineering | VOL. 50

SCnorm: robust normalization of single-cell RNA-seq data.
Rhonda Bacher ... Michael Newton
Nature Methods | VOL. 14
Rhonda Bacher, et. al.Rhonda Bacher ... Michael Newton
17 Apr 2017
Nature Methods | VOL. 14

The Problem Solving Skills and Learning Performance in Learning Multi-Touch Interactive Jigsaw Game Using Digital Scaffolds
Cheng-Yu Hung ... Ting-Wen Chang
-
Cheng-Yu Hung, et. al.Cheng-Yu Hung ... Ting-Wen Chang
01 Mar 2012
01 Mar 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE