Ranking, selecting, and prioritising genes with desirability functions

Stanley E Lazic

doi:10.7717/peerj.1444

Abstract

In functional genomics experiments, researchers often select genes to follow-up or validate from a long list of differentially expressed genes. Typically, sharp thresholds are used to bin genes into groups such as significant/non-significant or fold change above/below a cut-off value, and ad hoc criteria are also used such as favouring well-known genes. Binning, however, is inefficient and does not take the uncertainty of the measurements into account. Furthermore, p-values, fold-changes, and other outcomes are treated as equally important, and relevant genes may be overlooked with such an approach. Desirability functions are proposed as a way to integrate multiple selection criteria for ranking, selecting, and prioritising genes. These functions map any variable to a continuous 0–1 scale, where one is maximally desirable and zero is unacceptable. Multiple selection criteria are then combined to provide an overall desirability that is used to rank genes. In addition to p-values and fold-changes, further experimental results and information contained in databases can be easily included as criteria. The approach is demonstrated with a breast cancer microarray data set. The functions and an example data set can be found in the desiR package on CRAN (https://cran.r-project.org/web/packages/desiR/) and the development version is available on GitHub (https://github.com/stanlazic/desiR).

Highlights

High-throughput biology experiments typically generate long lists of differentially expressed genes, proteins, metabolites, or lipids
This paper focuses on gene expression, but the methods apply well to all -omics technologies
Applying desirability functions is a three step procedure: (1) choose the relevant variables to be used as selection criteria, (2) map the values for each variable onto a continuous 0–1 scale using the appropriate desirability function, and (3) calculate the overall desirability as a weighted combination of the individual desirabilities

Summary

Introduction

High-throughput biology experiments typically generate long lists of differentially expressed genes, proteins, metabolites, or lipids. This paper focuses on gene expression, but the methods apply well to all -omics technologies. After the data are analysed, often the step is to select a subset of genes for further experiments. While many computational methods have been developed (Moreau & Tranchevent, 2012), biologists often manually select genes based on p-values, fold-changes, average expression levels, variance across samples, and other criteria. There are, several shortcomings with selecting genes in this way. Hard thresholds are used to dichotomise or bin variables as either expressed/not expressed, significant/not significant, or above/below a fold-change cut-off.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Nov 26, 2015
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Ranking, selecting, and prioritising genes with desirability functions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Abstract 4806: Association of changes in 4-aminobutyrate aminotransferase (ABAT) and beta-alanine metabolism with breast cancer and the more aggressive estrogen receptor negative subtype
Jan Budczies ... Annika Lehmann
Cancer Research | VOL. 72
Jan Budczies, et. al.Jan Budczies ... Annika Lehmann
15 Apr 2012
Cancer Research | VOL. 72

Diagnosis of breast cancer with Stacked autoencoder and Subspace kNN
Kemal Adem
Physica A: Statistical Mechanics and its Applications | VOL. 551
Kemal AdemKemal Adem
21 Apr 2020
Physica A: Statistical Mechanics and its Applications | VOL. 551

Prioritizing Agricultural Lands for Conservation Buffer Placement Using Multiple Criteria1
Zeyuan Qiu
JAWRA Journal of the American Water Resources Association | VOL. 46
Zeyuan QiuZeyuan Qiu
12 Aug 2010
JAWRA Journal of the American Water Resources Association | VOL. 46

Building pathway clusters from Random Forests classification using class votes
Herbert Pang ... Hongyu Zhao
BMC Bioinformatics | VOL. 9
Herbert Pang, et. al.Herbert Pang ... Hongyu Zhao
06 Feb 2008
BMC Bioinformatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ranking, selecting, and prioritising genes with desirability functions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ