Abstract

Despite an extensive literature on statistical methods and their proper application to biological data, incorrect analyses remain a critical and widely spread problem in research papers. Inherently hierarchical (nested, clustered) structure of biological measurements is often erroneously neglected, leading to pseudo-replication and false positive results. This, in turn, complicates the correct assessment of statistical power and impairs optimal planning of experiments. In order to attract more attention to this problem and to illustrate the importance of direct account for the nested structure of biological data, in this article we present a simple open-source simulator of two-level normally distributed stochastic data. By defining ‘true’ mean values and ‘true’ intra- and inter-cluster variances of the simulated data, users of the simulator can test various scenarios, appreciate the importance of using correct multi-level analysis and the danger of neglecting the information about the data structure. Here we apply our nested data simulator to highlight some commonly arising mistakes with data analysis and propose a workflow, in which our simulator could be employed to correctly compare two nested groups of experimental data and to optimally plan new experiments in order to increase statistical power when necessary.

Highlights

  • Biological experiments often produce ‘clustered’ or ‘nested’ data

  • We show how false assumptions about data independence can lead to incorrect assessment of the statistical significance of the difference between the compared groups and how this result depends on the extent of intra-cluster correlation (ICC) of the data

  • We demonstrate how the statistical power of analysis changes depending on the number of clusters and the number of elements in them, and propose an algorithm for processing multi-level data and planning optimal experimental measurements

Read more

Summary

Introduction

Biological experiments often produce ‘clustered’ or ‘nested’ data. By these terms, we mean a hierarchical grouping of individual measurements via a certain principle, so that the data points in each cluster are not completely independent from each other (Fig. 1). Such grouping of data by day is one very common example of hierarchical data structure in biological experiments Another common example is a scenario when multiple data points are collected from a smaller number of animals, patients, cells, etc. In this case, measurements from the same animal/ patient/cell may not be completely independent, so they form clusters with slightly shifted mean values. We demonstrate how the statistical power of analysis changes depending on the number of clusters and the number of elements in them, and propose an algorithm for processing multi-level data and planning optimal experimental measurements

Methods
Results
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call