TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

Yingdong Zhao,Mariam M Konaté,Chris Karlovich,Ming-Chung Li,Biswajit Das,James H Doroshow,P Mickey Williams,Li Chen,Yvonne A Evrard,Lisa M Mcshane

doi:10.1186/s12967-021-02936-w

Abstract

BackgroundIn order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis.MethodsIn the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis.ResultsOur results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data.ConclusionWe provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples.

Highlights

In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions
The pattern of sample clustering differed depending on the gene expression quantification measure used (Fig. 1A, B)
For clustering based on 1-Pearson correlation distance matrix generated using Transcript per Million (TPM) data (Fig. 1A, right panel), the three samples from patient-derived xenograft (PDX) model 475296252-R did not cluster together despite being replicate samples originating from the same human tumor

Summary

Introduction

In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. In contrast to the aforementioned alignment-based methods, transcript quantification tools Salmon, Sailfish, and kallisto were designed to boost processing speed and to decrease memory and disk usage by bypassing the creation and storage of BAM files [6,7,8]. This approach is useful for the discovery of novel transcripts, when sequencing poorly annotated transcriptomes, and to detect lowly expressed genes [9]. RNA-seq isoform quantification software summarize transcript expression levels either as TPM (transcript per million), RPKM (reads per kilobase of transcript per million reads mapped), or FPKM (fragments per kilobase of transcript per million reads mapped); all three measures account for sequencing depth and feature length [4]

Objectives

Methods

Results

Discussion

Conclusion