Abstract

A common practice in gene expression studies is to use ‘housekeepers’, i.e., genes expected to be expressed at relatively constant levels across experimental conditions, to normalize data. The process is to divide an expression value by some composite of one or more stable housekeepers to remove the effect of processing and nuance variables. Despite its reverence and widespread use, we argue that this approach is fundamentally flawed on multiple levels. The outcome of housekeeper normalization is a set of ratio variables which are not amenable to many standard statistical tests. There are no universal housekeeper genes and even within specific cohorts proposed housekeeper genes often fail to replicate. Furthermore, there is also no single agreed upon algorithm for performing housekeeper normalization or agreement regarding what constitutes a good housekeeper. We urge researchers to consider the use of alternative methodologies in their research.

Highlights

  • Housekeeper normalization is a process commonly used in gene expression [1,2]

  • Multiple papers focus on finding good housekeeper genes for use within a given research area [1,3,4,5] while others focus on finding universal housekeepers [6]

  • A common approach taken in the field is to use housekeeper normalization

Read more

Summary

Introduction

Housekeeper normalization is a process commonly used in gene expression [1,2]. The basic process of housekeeper normalization is to divide a gene expression value by some composite of one or more housekeepers, i.e., constitutive genes that are believed to have a relatively constant level of expression across experimental conditions. Multiple papers focus on finding good housekeeper genes for use within a given research area [1,3,4,5] while others focus on finding universal housekeepers [6] Despite such attempts, careful inspection of literature across multiple conditions or disease states can often find multiple conflicting reports on the validity of many housekeepers as well as the variability in housekeepers across samples, tissue types and physiological states is well documented [7,8]. The use of multiple housekeepers is quite common and there is no consistent agreement about how many housekeepers are needed or how precisely to aggregate multiple housekeepers into a single composite measure It is almost impossible for a reader to truly know what exactly was done when a paper says they selected and normalized to a set of housekeepers. 0.5! the expected correlation approximately −1 or ~-0.7071

Hv will be
Discussion and Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call