Abstract

BackgroundRNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements. However, the comparative efficacy of the various normalization techniques has not been tested in a standardized manner. Here we propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization.ResultsIn this study we present two tests to assess the validity of normalization methods applied to a large-scale data set collected for systematic evaluation purposes. We tested various RNASeq normalization procedures and concluded that transcripts per million (TPM) was the best performing normalization method based on its preservation of biological signal as compared to the other methods tested.ConclusionNormalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments. More work, however, needs to be performed to optimize normalization methods for RNASeq data. The present effort helps pave the way for more systematic evaluations of normalization methods across different platforms. With our proposed schema researchers can evaluate their own or future normalization methods to further improve the field of RNASeq normalization.

Highlights

  • RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease

  • We performed two experiments to assess the validity of common normalization methods on a data set generated by the Sequencing Quality Control (SEQC) consortium

  • Normalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments since normalization controls for experimental error while preserving biological truth

Read more

Summary

Introduction

RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. We propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization. Previous studies have compared normalization methods to determine which method best preserves biological reality while reducing experimental noise [2]. Most of these experiments were conducted on small, heterogeneous data sets that were not collected for the specific purpose of systems level evaluation [3, 4]. There is a lack of standardized tests to evaluate the various normalization methods commonly employed

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call