Abstract

BackgroundThe advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. RNA-Seq provides a far more precise measurement of transcript levels and their isoforms compared to other methods such as microarrays. A fundamental goal of RNA-Seq is to better identify expression changes between different biological or disease conditions. However, existing methods for detecting differential expression from RNA-Seq count data have not been comprehensively evaluated in large-scale RNA-Seq datasets. Many of them suffer from inflation of type I error and failure in controlling false discovery rate especially in the presence of abnormal high sequence read counts in RNA-Seq experiments.ResultsTo address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data. This framework contains new normalization methods based on generalized Poisson distribution modeling sequence count data, followed by permutation-based differential expression tests. We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.ConclusionsSoftware implementing our deGPS was released within an R package with parallel computations (https://github.com/LL-LAB-MCW/deGPS). deGPS is a powerful and robust tool for data normalization and detecting different expression in RNA-Seq experiments. Beyond RNA-Seq, deGPS has the potential to significantly enhance future data analysis efforts from many other high-throughput platforms such as ChIP-Seq, MBD-Seq and RIP-Seq.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1676-0) contains supplementary material, which is available to authorized users.

Highlights

  • The advent of the Next-generation sequencing (NGS) technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA sequencing (RNA-Seq)) at unprecedented speed and very low cost

  • * Correspondence: pliu@mcw.edu; yanlu76@zju.edu.cn 2Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA 3Department of Gynecologic Oncology, The Affiliated Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang 310029, China Full list of author information is available at the end of the article In RNA-Seq experiments, millions of short sequence reads are aligned to a reference genome and the number of reads that fall into a particular genomic region is recorded, as read count data

  • These regions of interest are annotated as microRNA, small interfering RNAs, long noncoding RNAs, or messenger RNA in the context of RNA-Seq experiment, here all referred to as transcripts

Read more

Summary

Introduction

The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. In RNA-Seq experiments, millions of short sequence reads are aligned to a reference genome and the number of reads that fall into a particular genomic region is recorded, as read count data. These regions of interest are annotated as microRNA (miRNA), small interfering RNAs (siRNA), long noncoding RNAs (lncRNA), or messenger RNA (mRNA) in the context of RNA-Seq experiment, here all referred to as transcripts. A major objective of RNA-Seq is to better identify countbased expression changes between different biological or disease conditions. A major challenge in differential expression analysis in RNA-Seq data is the unexpectedly large variability of sequence count data among transcripts.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call