Abstract

Correlation coefficients are abundantly used in the life sciences. Their use can be limited to simple exploratory analysis or to construct association networks for visualization but they are also basic ingredients for sophisticated multivariate data analysis methods. It is therefore important to have reliable estimates for correlation coefficients. In modern life sciences, comprehensive measurement techniques are used to measure metabolites, proteins, gene-expressions and other types of data. All these measurement techniques have errors. Whereas in the old days, with simple measurements, the errors were also simple, that is not the case anymore. Errors are heterogeneous, non-constant and not independent. This hampers the quality of the estimated correlation coefficients seriously. We will discuss the different types of errors as present in modern comprehensive life science data and show with theory, simulations and real-life data how these affect the correlation coefficients. We will briefly discuss ways to improve the estimation of such coefficients.

Highlights

  • Correlation coefficients are abundantly used in the life sciences

  • With the increase of comprehensive measurements (liquid-chromatography mass-spectrometry, nuclear magnetic resonance (NMR), gas-chromatography mass-spectrometry (MS) in metabolomics and proteomics; RNA-sequencing in transcriptomics) in life sciences, correlations are used as a first tool for visualization and interpretation, possibly after selection of a threshold to filter the correlations

  • Since measurement error cannot be avoided, correlation coefficients calculated from experimental data are distorted to a degree which is not known and that has been neglected in life sciences applications but can be expected to be considerable when comprehensive omics measurement are taken

Read more

Summary

Correction for realistic error

To obtain a (51) and (52), which is the corrected estimation of the correlation sample correlation calculated from the dcoateaf.fiTciheenetfρfe0,ctthoef ρ is the substituted by r in (50), correction is shown, for the realistic error model (15), in Fig. 8 where the true know error variance components have been used. It should be noted that it is possible that the corrected correlation exceeds ±1.0 This phenomenon has already been observed and discussed[16,30]: it is due to the fact that the sampling error of a correlation coefficient corrected for distortion is greater than would be that of an uncorrected coefficient of the same size (at least for the uncorrelated additive error[4,18,31]). Different approaches can be foreseen to estimate the error components which is not a trivial task, including the use of (generalized) linear mixed model[33,34], error covariance matrix formulation[29,35,36] or common factor analysis factorization[37] None of these approaches is straightforward and require some extensive mathematical manipulations to be implemented; an accurate investigation of the simulation of the error component is outside the scope of this paper and will presented in a future publication

Discussion
Findings
Material and Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call