What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions

Edoardo Saccenti

doi:10.3389/fsysb.2023.1042156

Edoardo Saccenti

Open Access

https://doi.org/10.3389/fsysb.2023.1042156

Copy DOI

Journal: Frontiers in Systems Biology	Publication Date: Jan 30, 2023
Citations: 3	License type: CC BY 4.0

Affiliation: Wageningen University & Research

Abstract

In the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basic assumptions underlying the use of the correlation coefficient: sampling from one population and independence of the observations (independence of errors). Since correlations are used to measure and infer associations between biological entities, this has tremendous implications on the reliability of scientific results, as the violation of these assumption leads to wrong and biased results. In this technical note, I review some basic properties of the Pearson’s correlation coefficient and illustrate some exemplary problems with simulated and experimental data, taking a didactic approach with the use of supporting graphical examples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions

Abstract

Talk to us

Similar Papers

More From: Frontiers in Systems Biology

Lead the way for us

Similar Papers

BioWizard: Discovering and validating associations between biological entities by integrated analysis of scientific literature and experimental data
Concetto Spampinato ... Sebastiano Milardo
-
Concetto Spampinato, et. al.Concetto Spampinato ... Sebastiano Milardo
01 Jun 2012
01 Jun 2012

Variability in the Gulf of Alaska from Geosat altimetry data
Shyam Bhaskaran ... Gary S E Lagerloef
Journal of Geophysical Research: Oceans | VOL. 98
Shyam Bhaskaran, et. al.Shyam Bhaskaran ... Gary S E Lagerloef
15 Sep 1993
Journal of Geophysical Research: Oceans | VOL. 98

Normalization and Statistical Analysis of Quantitative Proteomics Data Generated by Metabolic Labeling
Lily Ting ... Ricardo Cavicchioli
Molecular & Cellular Proteomics | VOL. 8
Lily Ting, et. al.Lily Ting ... Ricardo Cavicchioli
01 Oct 2009
Molecular & Cellular Proteomics | VOL. 8

Understanding and Improving the Trust in Results of Numerical Simulations and Scientific Data Analytics
Franck Cappello ... Sheng Di
-
Franck Cappello, et. al.Franck Cappello ... Sheng Di
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions

Abstract

Talk to us

Similar Papers

More From: Frontiers in Systems Biology