Comprehensive evaluation of RNA-seq quantification methods for linearity

Haijing Jin,Zhandong Liu,Ying-Wooi Wan

doi:10.1186/s12859-017-1526-y

Haijing Jin, Zhandong Liu + Show 1 more

Open Access

https://doi.org/10.1186/s12859-017-1526-y

Copy DOI

Abstract

BackgroundDeconvolution is a mathematical process of resolving an observed function into its constituent elements. In the field of biomedical research, deconvolution analysis is applied to obtain single cell-type or tissue specific signatures from a mixed signal and most of them follow the linearity assumption. Although recent development of next generation sequencing technology suggests RNA-seq as a fast and accurate method for obtaining transcriptomic profiles, few studies have been conducted to investigate best RNA-seq quantification methods that yield the optimum linear space for deconvolution analysis.ResultsUsing a benchmark RNA-seq dataset, we investigated the linearity of abundance estimated from seven most popular RNA-seq quantification methods both at the gene and isoform levels. Linearity is evaluated through parameter estimation, concordance analysis and residual analysis based on a multiple linear regression model. Results show that count data gives poor parameter estimations, large intercepts and high inter-sample variability; while TPM value from Kallisto and Salmon shows high linearity in all analyses.ConclusionsSalmon and Kallisto TPM data gives the best fit to the linear model studied. This suggests that TPM values estimated from Salmon and Kallisto are the ideal RNA-seq measurements for deconvolution studies.

Highlights

Deconvolution is a mathematical process of resolving an observed function into its constituent elements
To further assess the linearity of the RNA-seq quantifications, we evaluated the fitted model’s prediction through three analyses: 1) a concordance analysis between measured C and fitted value C, 2) a receiver operating characteristic (ROC) curve-like analysis on the absolute residual values, and 3) residual analysis for rescaled model
We conducted a comprehensive study to assess the linearity of gene and isoform abundance reported by different RNA-seq quantification methods based on the performance how these quantifications fitted in a multiple regression linear model

Summary

Introduction

Deconvolution is a mathematical process of resolving an observed function into its constituent elements. In the field of biomedical research, deconvolution analysis is applied to obtain single cell-type or tissue specific signatures from a mixed signal and most of them follow the linearity assumption. Recent development of generation sequencing technology suggests RNA-seq as a fast and accurate method for obtaining transcriptomic profiles, few studies have been conducted to investigate best RNA-seq quantification methods that yield the optimum linear space for deconvolution analysis. Next-generation sequencing based technology for RNA profiling (RNA-seq) has become the predominant method to quantify the transcript abundance in cells. In this study, we conducted a comprehensive comparison of seven RNA-seq quantification methods on the linearity of the estimated abundance using a deep sequencing dataset where RNA samples were mixed at known proportions. Our results will provide a good recommendation to researchers considering deconvolution on RNA-seq data

Objectives

Methods

Results

Discussion

Conclusion