An integrative method to normalize RNA-Seq data.

Cyril Filloux,Philippe Romain,Rocha Dominique,Meersseman Cédric,Maftah Abderrahman,Klopp Christophe,Forestier Lionel,Petit Daniel

doi:10.1186/1471-2105-15-188

Abstract

BackgroundTranscriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method.ResultsWe present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request.ConclusionsThe methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required.

Highlights

Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification
According to Illumina technology, (i) a cDNA library from a given tissue is randomly fragmented by sonication, (ii) specific adapters are ligated for the assignation of each fragment to the corresponding sample, (iii) PCR amplification are performed, and (iv) amplified mRNA fragments with sizes ranging from 250 to 450 bp are isolated before being sequenced
As qRT-PCR quantification were used to validate our RNA-Seq normalization method, it was necessary to verify that qRT-PCR data were not subject to transcript size and GC content biases

Summary

Introduction

Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. RNA-Seq offers the possibility to get information on sequence and quantification of all transcribed genes, but extremely lowly expressed ones [1]. As shown by these authors, this method differs from the microarrays which have limitations due to (i) the difficulty to design specific probes, leading to artifacts caused by crosshybridization and (ii) the impossibility to detect expression for non-annotated genes. The second step of quantification consists in removing four biases affecting read counts: (i) the number of reads increases with the size of the transcript [2,3,4,5,6], (ii) with the amount of the cDNA library [7,8], (iii) sequencing efficiency decreases when the GC-content is too low or too high [9,10,11,12], and (iv) due to a PCR amplification step during the library preparation, PCR duplicates occur when two copies of the same cDNA fragment produce different clusters on the flow cell [13,14,15]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 14, 2014
Citations: 49	License type: cc-by

R Discovery Prime

R Discovery Prime

An integrative method to normalize RNA-Seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Identification of Stable Reference Genes for Gene Expression Analysis of Three-Dimensional Cultivated Human Bone Marrow-Derived Mesenchymal Stromal Cells for Bone Tissue Engineering
Juliane Rauh ... Maik Stiehler
Tissue Engineering Part C: Methods | VOL. 21
Juliane Rauh, et. al.Juliane Rauh ... Maik Stiehler
11 Aug 2014
Tissue Engineering Part C: Methods | VOL. 21

Comparison of 12 Reference Genes for Normalization of Gene Expression Levels in Epstein-Barr Virus-Transformed Lymphoblastoid Cell Lines and Fibroblasts
Arjan P M Brouwer ... Hannie Kremer
Molecular Diagnosis & Therapy | VOL. 10
Arjan P M Brouwer, et. al.Arjan P M Brouwer ... Hannie Kremer
01 May 2006
Molecular Diagnosis & Therapy | VOL. 10

Identification of suitable internal control genes for expression studies in Coffea arabica under different experimental conditions.
Carla F Barsalobres-Cavallari ... Ivan G Maia
BMC Molecular Biology | VOL. 10
Carla F Barsalobres-Cavallari, et. al.Carla F Barsalobres-Cavallari ... Ivan G Maia
01 Jan 2009
BMC Molecular Biology | VOL. 10

Identification of Appropriate Reference Genes for Normalizing miRNA Expression in Citrus Infected by Xanthomonas citri subsp. citri.
Shiheng Lyu ... Jianjun Chen
Genes | VOL. 11
Shiheng Lyu, et. al.Shiheng Lyu ... Jianjun Chen
23 Dec 2019
Genes | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An integrative method to normalize RNA-Seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics