Abstract

Normalization is an essential step in the analysis of high-throughput data. Multi-sample global normalization methods, such as quantile normalization, have been successfully used to remove technical variation. However, these methods rely on the assumption that observed global changes across samples are due to unwanted technical variability. Applying global normalization methods has the potential to remove biologically driven variation. Currently, it is up to the subject matter experts to determine if the stated assumptions are appropriate. Here, we propose a data-driven alternative. We demonstrate the utility of our method (quantro) through examples and simulations. A software implementation is available from http://www.bioconductor.org/packages/release/bioc/html/quantro.html.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0679-0) contains supplementary material, which is available to authorized users.

Highlights

  • Multi-sample normalization techniques such as quantile normalization [1, 2] have become a standard and essential part of analysis pipelines for high-throughput data

  • We considered multiple studies from the Gene Expression Omnibus (GEO) to represent each tissue to prevent batch effects [29] of different studies from GEO being confounded with differences in tissues

  • Because global changes in distributions between groups can be caused by both technical variation and biological variation, it is important to note that our test statistic Fquantro will detect global differences caused by both technical variation and biological variation

Read more

Summary

Introduction

Multi-sample normalization techniques such as quantile normalization [1, 2] have become a standard and essential part of analysis pipelines for high-throughput data. Some of the first attempts at normalizing microarray data mimicked the use of so-called house-keeping genes [6] as was done by the established gene expression measurement technology that preceded microarrays. This approach did not work well in practice [7, 8]; data-driven approaches were developed, such as median correction [9, 10], variance-stabilizing transformation [11], locally weighted linear regression (loess) [12] and spline-based methods [13]. We refer to these as global adjustment methods [15]

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call