Abstract
RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression.
Highlights
One of the recent methods for gene expression profiling is RNA-Seq
An advantage of RNA-Seq over other gene expression profiling technologies is that it allows a comprehensive assay that does not require probes for targets to be specified in advance
Various normalization procedures have been proposed in literature for RNA-Seq and here we evaluate three different normalization methods: (1) transcripts parts per million, (2) trimmed mean of M values, (3) quantile normalization
Summary
One of the recent methods for gene expression profiling is RNA-Seq. An advantage of RNA-Seq over other gene expression profiling technologies is that it allows a comprehensive assay that does not require probes for targets to be specified in advance. Before detecting biologically significant RNAs, systematic technical variations due to experimental variability need to be removed retaining effects resulting from the biological process of interest. Various procedures for normalization of RNA-Seq have been proposed in literature, such as transcripts parts per million [2], trimmed mean of M values [3], and quantile normalization [4] Though these methods have been frequently used, no comparative analysis has been presented so far. We propose two statistical methods for inferring differential expression for RNA-Seq data They are likelihood ratio method and Bayesian method. Results along with a systematic comparison are presented on three real datasets and we conclude with a brief discussion
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have