Abstract

Methods for normalization of RNA-sequencing gene expression data commonly assume equal total expression between compared samples. In contrast, scenarios of global gene expression shifts are many and increasing. Here we compare the performance of three normalization methods when polyA+ RNA content fluctuates significantly during zebrafish early developmental stages. As a benchmark we have used reverse transcription-quantitative PCR. The results show that reads per kilobase per million (RPKM) and trimmed mean of M-values (TMM) normalization systematically leads to biased gene expression estimates. Biological scaling normalization (BSN), designed to handle differences in total expression, showed improved accuracy compared to the two other methods in estimating transcript level dynamics. The results have implications for past and future studies using RNA-sequencing on samples with different levels of total or polyA+ RNA.

Highlights

  • RNA sequencing (RNA-seq) is frequently used for global gene expression analysis

  • We demonstrate a clear advantage of using an approach mimicking the polyA+ RNA levels (BSN), compared to methods aimed at making samples similar (RPM and Trimmed Mean of M-values (TMM) normalization)

  • Our results show that the normalized expression values were consistently best approximated by Biological scaling normalization (BSN) when compared to a Reverse transcription (RT)-qPCR benchmark

Read more

Summary

Introduction

RNA sequencing (RNA-seq) is frequently used for global gene expression analysis. RNA-seq generates short reads from fragmented RNA molecules and the number of reads is proportional to the abundance and length of the transcripts [1]. Among normalization methods published are the well-known ‘‘reads per kilobase of transcripts per million mapped reads’’ (RPKM) [4] and the less frequently used median and quantile normalization methods (reviewed in [2]). Another strategy, presented by Robinson and Oshlack [5], introduces a scaling factor called Trimmed Mean of M-values (TMM), which aims to represent the ‘‘global fold-change’’. Application of this method results in samples of similar total expression, which may not be biologically correct

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call