Abstract

BackgroundRNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. The comparative analyses of the data require appropriate statistical methods to estimate the differential expression of transcript variants across different cell/tissue types and disease conditions.ResultsWe developed a novel nonparametric empirical Bayesian-based approach (NPEBseq) to model the RNA-seq data. The prior distribution of the Bayesian model is empirically estimated from the data without any parametric assumption, and hence the method is “nonparametric” in nature. Based on this model, we proposed a method for detecting differentially expressed genes across different conditions. We also extended this method to detect differential usage of exons from RNA-seq data. The evaluation of NPEBseq on both simulated and publicly available RNA-seq datasets and comparison with three popular methods showed improved results for experiments with or without biological replicates.ConclusionsNPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets. In addition, NPEBSeq performs significantly better than current methods and can be applied to genome-wide RNA-seq datasets. Sample datasets and R package are available at http://bioinformatics.wistar.upenn.edu/NPEBseq.

Highlights

  • RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts

  • The advent of massive parallel sequencing, popularly known as Next-Generation Sequencing (NGS), is allowing whole genomes and transcriptomes to be sequenced with extraordinary speed and accuracy, providing insights into the bewildering complexity of gene expression at both gene and isoform levels [1]

  • The expression level of genes with low read counts is estimated by borrowing information from the gene expression in the whole sample

Read more

Summary

Introduction

RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. Estimating the transcript-level expression from the collection of counts of short read sequences that map to exons (or exon slices) and exon junctions is a computationally challenging problem, which has been recently attempted by us and others, in programs such as IsoformEx [3], rSeq [4], Cufflinks [5], RSEM [6], BASIS [7], and GPSeq [8] None of these methods showed good agreement with qRT-PCR measurements, a gold standard in measuring differential RNA abundance between samples [3]. There is a critical need for investigation of other statistical methods for normalization and differential expression analysis of RNA-seq data across different conditions

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call