Abstract

BackgroundReliability and reproducibility are key metrics for gene expression assays. This report assesses the utility of the correlation coefficient in the analysis of reproducibility and reliability of gene expression data.ResultsThe correlation coefficient alone is not sufficient to assess equality among sample replicates but when coupled with slope and scatter plots expression data equality can be better assessed. Narrow-intervals of scatter plots should be shown as a tool to inspect the actual level of noise within the data. Here we propose a method to examine expression data reproducibility, which is based on the ratios of both the means and the standard deviations for the inter-treatment expression ratios of genes. In addition, we introduce a fold-change threshold with an inter-replicate occurrence likelihood lower than 5% to perform analysis even when reproducibility is not acceptable. There is no possibility to find a perfect correlation between transcript and protein levels even when there is not any post-transcriptional regulatory mechanism. We therefore propose an adjustment for protein abundance with that of transcript abundance based on open reading frame length.ConclusionsHere, we introduce a very efficient reproducibility approach. Our method detects very small changes in large datasets which was not possible through regular correlation analysis. We also introduce a correction on protein quantities which allows us to examine the post-transcriptional regulatory effects with a higher accuracy.Electronic supplementary materialThe online version of this article (doi:10.1186/2241-5793-21-3) contains supplementary material, which is available to authorized users.

Highlights

  • Reliability and reproducibility are key metrics for gene expression assays

  • The correlation coefficient is not sufficient as a reproducibility assay for expression data By analyzing a publicly available RNA-Seq data we examined reproducibility of ≈ 25000 genes between two replicates

  • To further analyze the impact of open reading frames (ORFs) length-based bias in quantitative analysis of transcript and protein expressions, we looked at the correlation of mRNA changes with protein changes between human and chimpanzee reported by Fu et al [25]

Read more

Summary

Introduction

Reliability and reproducibility are key metrics for gene expression assays. This report assesses the utility of the correlation coefficient in the analysis of reproducibility and reliability of gene expression data. Reliability or accuracy of data-to-reality, and reproducibility or inter-replicate variance, are typically assessed using the correlation coefficient. As the most recent approach, reliability of the output has been examined by correlation analysis using microarray and/or Real Time RT-PCR [5,6,7,8,9,10]. The correlation between replicates has been used to judge the reproducibility of data [7,10,11,12,13,14,15].

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.