Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.

Yanzhu Lin,Kseniya Golovnina,Hang Noh Lee,Hina Sultana,Zhen-Xia Chen,Susan T Harbison,Brian Oliver,Yazmin L Serrano Negron

doi:10.1186/s12864-015-2353-z

Abstract

BackgroundA generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set.ResultsWe found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex.ConclusionsThe best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2353-z) contains supplementary material, which is available to authorized users.

Highlights

A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist
Application of low gene expression threshold It is well known that RNA-Seq read counts, which are presumed to be the signal of gene expression, contain a certain degree of uncertainty
The total counts (TC), upper quartile (UQ), DESeq, and reads per kilobase of million mapped reads (RPKM) normalization methods were robust to filtering strategy

Summary

Introduction

A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. To account for over-dispersion, a generalized linear model (GLM) using a negative binomial distribution has been proposed [13, 14] These issues leave the experimenter with several choices to make regarding data analysis: 1) which read counts to include in the analysis and which to discard; 2) which normalization methods will mitigate bias across samples; and 3) the best choice of statistical model to identify differentially expressed genes. Systematic comparisons of these choices as well as other experimental parameters have been made previously with microarray data. Comparison studies such as these have provided biologists with the most critical parameters to consider when designing microarray gene expression studies and analyzing the results

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 5, 2016
Citations: 180	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing
José A Robles ... Conrad J Burden
BMC Genomics | VOL. 13
José A Robles, et. al.José A Robles ... Conrad J Burden
17 Sep 2012
BMC Genomics | VOL. 13

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.
Xiaohong Li ... Eric C Rouchka
PLOS ONE | VOL. 12
Xiaohong Li, et. al.Xiaohong Li ... Eric C Rouchka
01 May 2017
PLOS ONE | VOL. 12

Quantifying the impact of inter-site heterogeneity on the distribution of ChIP-seq data.
Jonathan Cairns ... Simon Tavarã©
Frontiers in Genetics | VOL. 5
Jonathan Cairns, et. al.Jonathan Cairns ... Simon Tavarã©
14 Nov 2014
Frontiers in Genetics | VOL. 5

Methodologic Challenges in the Analysis of Count Data in Radiology Health Services Research
Bahman Roudsari ... Jeffrey G Jarvik
Journal of the American College of Radiology | VOL. 8
Bahman Roudsari, et. al.Bahman Roudsari ... Jeffrey G Jarvik
30 Jul 2011
Journal of the American College of Radiology | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics