Abstract

BackgroundRNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible.ResultsOur findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis.ConclusionsHigh heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

Highlights

  • RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes

  • The results indicate the poor reproducibility of differential expression (DE) results, which can be clearly seen from the changes of overlap rate in Fig. 2b, d, and f as well

  • DE results of small sample sizes are more susceptible to heterogeneity, compared with those of large sample sizes

Read more

Summary

Introduction

RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. The emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. RNA-Seq has become an indispensable tool for transcriptome-wide analysis of differential gene expression in oncology to elucidate the mechanism of tumorigenesis and metastasis [1,2,3]. High genetic heterogeneity may greatly affect differentially expressed gene (DEG) detection in RNA-seq analysis and undermine the reliability of differential expression (DE) results. The impact of tumor heterogeneity on the reliability of DE results obtained from RNA-seq data has rarely been studied. Since RNA-Seq has been used extensively in cancer research, it is urgent to study the potential effect of tumor heterogeneity on the reliability of DE results in RNA-seq analysis

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call