Abstract

BackgroundThe analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies.MethodsWe reviewed hundreds of MEDLINE-indexed papers involving gene expression data analysis, which were published between 2003 and 2005. These papers were examined on the basis of their reporting of several factors, such as sample size, statistical power and software availability.ResultsAmong the examined papers, we concentrated on 293 papers consisting of applications and new methodologies. These papers did not report approaches to sample size and statistical power estimation. Explicit statements on data transformation and descriptions of the normalisation techniques applied prior to data analyses (e.g. classification) were not reported in 57 (37.5%) and 104 (68.4%) of the methodology papers respectively. With regard to papers presenting biomedical-relevant applications, 41(29.1 %) of these papers did not report on data normalisation and 83 (58.9%) did not describe the normalisation technique applied. Clustering-based analysis, the t-test and ANOVA represent the most widely applied techniques in microarray data analysis. But remarkably, only 5 (3.5%) of the application papers included statements or references to assumption about variance homogeneity for the application of the t-test and ANOVA. There is still a need to promote the reporting of software packages applied or their availability.ConclusionRecently-published gene expression data analysis studies may lack key information required for properly assessing their design quality and potential impact. There is a need for more rigorous reporting of important experimental factors such as statistical power and sample size, as well as the correct description and justification of statistical methods applied. This paper highlights the importance of defining a minimum set of information required for reporting on statistical design and analysis of expression data. By improving practices of statistical analysis reporting, the scientific community can facilitate quality assurance and peer-review processes, as well as the reproducibility of results.

Highlights

  • The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets

  • Our research examined how papers present fundamental statistical justifications or assumptions for the correct application of the t-test and ANOVA, which are widely applied to gene expression data analysis

  • The reporting of the following factors was examined: a) type of study; b) reporting of methods of sample size calculation and statistical power; c) reporting of data standardisation and method of normalisation applied; d) description of data analysis techniques applied; e) discussion about missing values; f) explicit statement of directionality; g) explicit statement of hypothesis and alternative; and h) reference to software tools applied for implementing data analyses

Read more

Summary

Introduction

The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies. The analysis of large-scale gene expression has become a fundamental approach to functional genomics, the identification of clinical diagnostic factors and potential drug targets. A fundamental objective in microarray data analysis is to identify a subset of genes that are differentially expressed between different samples (e.g. conditions, treatments or experimental perturbations) of interest. Information on experimental design and the correct use of statistical methods is fundamental to aid the community in correctly accomplishing their interpretations and assessments

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call