Abstract

We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort (‘housekeeping genes’) typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC.

Highlights

  • In a recent paper[1], we introduced the Gini index[2,3,4,5] as a very useful, nonparametric statistical measure for identifying those genes whose expression varied least across a large set of samples

  • It became obvious that an analysis of the Gini coefficient (GC) of the various genes was precisely what was required to assess those ‘housekeeping’ genes that varied least across a set of expression profiles, and we found 35 transcripts for which the GC was 0.15 or below when assessing 56 mammalian cell lines taken from a wide variety of tissues[1]

  • We previously identified a number of genes in the Human Protein Atlas (HPA) cell line data set[93] with very low expression variability and potential for use as reference genes[1]

Read more

Summary

Introduction

In a recent paper[1], we introduced the Gini index (or Gini coefficient, GC)[2,3,4,5] as a very useful, nonparametric statistical measure for identifying those genes whose expression varied least across a large set of samples (when normalised appropriately[6] to the total expression level of transcripts). Perhaps surprisingly[48], rather than letting the data speak for themselves, choices of candidate reference genes were often made on the basis that reference genes should be ‘housekeeping’ genes that would be assumed (‘hypothesised’) to vary comparatively little between cells, be involved in nominal routine metabolism and that they should have a reasonably high expression level (e.g.49–66) This is not necessarily the best strategy, and there is (and see below) quite a wide degree of variation of the expression of most standard housekeeping genes between cells or tissues (e.g.53,62,65,67–79). We note too that the precision of these digital methods (as with other, digital, single-molecule strategies90–92), means that the requirement for reasonably high-level expression levels is much less acute

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call