Conventional gene expression quantification approaches, such as microarrays or quantitative PCR, have similar variations of estimates for all genes. However, next-generation short-read or long-read sequencing use read counts to estimate expression levels with much wider dynamic ranges. In addition to the accuracy of estimated isoform expression, efficiency, which measures the degree of estimation uncertainty, is also an important factor for downstream analysis. Instead of read count, we present DELongSeq, which employs information matrix of EM algorithm to quantify uncertainty of isoform expression estimates to improve estimation efficiency. DELongSeq uses random-effect regression model for the analysis of DE isoform, in that within-study variation represents variable precision in isoform expression estimation and between-study variation represents variation in isoform expression levels across samples. More importantly, DELongSeq allows 1 case versus 1 control comparison of differential expression, which has specific application scenarios in precision medicine (such as before versus after treatment, or tumor versus stromal tissues). Through extensive simulations and analysis of several RNA-Seq datasets, we show that the uncertainty quantification approach is computationally reliable, and can improve the power of differential expression (DE) analysis of isoforms or genes. In summary, DELongSeq allows for efficient detection of differential isoform/gene expression from long-read RNA-Seq data.
Read full abstract