Abstract

In functional genomics experiments, researchers often select genes to follow-up or validate from a long list of differentially expressed genes. Typically, sharp thresholds are used to bin genes into groups such as significant/non-significant or fold change above/below a cut-off value, and ad hoc criteria are also used such as favouring well-known genes. Binning, however, is inefficient and does not take the uncertainty of the measurements into account. Furthermore, p-values, fold-changes, and other outcomes are treated as equally important, and relevant genes may be overlooked with such an approach. Desirability functions are proposed as a way to integrate multiple selection criteria for ranking, selecting, and prioritising genes. These functions map any variable to a continuous 0–1 scale, where one is maximally desirable and zero is unacceptable. Multiple selection criteria are then combined to provide an overall desirability that is used to rank genes. In addition to p-values and fold-changes, further experimental results and information contained in databases can be easily included as criteria. The approach is demonstrated with a breast cancer microarray data set. The functions and an example data set can be found in the desiR package on CRAN (https://cran.r-project.org/web/packages/desiR/) and the development version is available on GitHub (https://github.com/stanlazic/desiR).

Highlights

  • High-throughput biology experiments typically generate long lists of differentially expressed genes, proteins, metabolites, or lipids

  • This paper focuses on gene expression, but the methods apply well to all -omics technologies

  • Applying desirability functions is a three step procedure: (1) choose the relevant variables to be used as selection criteria, (2) map the values for each variable onto a continuous 0–1 scale using the appropriate desirability function, and (3) calculate the overall desirability as a weighted combination of the individual desirabilities

Read more

Summary

Introduction

High-throughput biology experiments typically generate long lists of differentially expressed genes, proteins, metabolites, or lipids. This paper focuses on gene expression, but the methods apply well to all -omics technologies. After the data are analysed, often the step is to select a subset of genes for further experiments. While many computational methods have been developed (Moreau & Tranchevent, 2012), biologists often manually select genes based on p-values, fold-changes, average expression levels, variance across samples, and other criteria. There are, several shortcomings with selecting genes in this way. Hard thresholds are used to dichotomise or bin variables as either expressed/not expressed, significant/not significant, or above/below a fold-change cut-off.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call