The characteristic direction: a geometrical approach to identify differentially expressed genes

Neil R Clark,Edward Y Chen,Avi Ma’Ayan,Kevin S Hu,Axel S Feldmann,Qiaonan Duan,Yan Kou

doi:10.1186/1471-2105-15-79

Neil R Clark, Edward Y Chen + Show 5 more

Open Access

https://doi.org/10.1186/1471-2105-15-79

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Mar 21, 2014
Citations: 196	License type: cc-by

Affiliation: Icahn School of Medicine at Mount Sinai

Abstract

BackgroundIdentifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling. Typically, DEG are identified by univariate approaches such as Significance Analysis of Microarrays (SAM) or Linear Models for Microarray Data (LIMMA) for processing cDNA microarrays, and differential gene expression analysis based on the negative binomial distribution (DESeq) or Empirical analysis of Digital Gene Expression data in R (edgeR) for RNA-seq profiling.ResultsHere we present a new geometrical multivariate approach to identify DEG called the Characteristic Direction. We demonstrate that the Characteristic Direction method is significantly more sensitive than existing methods for identifying DEG in the context of transcription factor (TF) and drug perturbation responses over a large number of microarray experiments. We also benchmarked the Characteristic Direction method using synthetic data, as well as RNA-Seq data. A large collection of microarray expression data from TF perturbations (73 experiments) and drug perturbations (130 experiments) extracted from the Gene Expression Omnibus (GEO), as well as an RNA-Seq study that profiled genome-wide gene expression and STAT3 DNA binding in two subtypes of diffuse large B-cell Lymphoma, were used for benchmarking the method using real data. ChIP-Seq data identifying DNA binding sites of the perturbed TFs, as well as known drug targets of the perturbing drugs, were used as prior knowledge silver-standard for validation. In all cases the Characteristic Direction DEG calling method outperformed other methods. We find that when drugs are applied to cells in various contexts, the proteins that interact with the drug-targets are differentially expressed and more of the corresponding genes are discovered by the Characteristic Direction method. In addition, we show that the Characteristic Direction conceptualization can be used to perform improved gene set enrichment analyses when compared with the gene-set enrichment analysis (GSEA) and the hypergeometric test.ConclusionsThe application of the Characteristic Direction method may shed new light on relevant biological mechanisms that would have remained undiscovered by the current state-of-the-art DEG methods. The method is freely accessible via various open source code implementations using four popular programming languages: R, Python, MATLAB and Mathematica, all available at: http://www.maayanlab.net/CD.

Highlights

Identifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling
Welsh’s t test, Significance Analysis of Microarrays (SAM) [5], and Linear Models for Microarray Data [6], and, in the case of highthroughput sequencing data, differential gene expression analysis based on the negative binomial distribution (DESeq2) [7]
We find that the direction normal to the separating hyper-plane provides a simple geometrical conceptualization of the differential expression, which naturally leads to extensions of the approach, such as a new formulation of gene set enrichment analysis

Summary

Introduction

Identifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling. DEG are identified by univariate approaches such as Significance Analysis of Microarrays (SAM) or Linear Models for Microarray Data (LIMMA) for processing cDNA microarrays, and differential gene expression analysis based on the negative binomial distribution (DESeq) or Empirical analysis of Digital Gene Expression data in R (edgeR) for RNA-seq profiling. After estimating the relative or absolute expression level of all transcripts, the step is to test statistical hypotheses [1] These hypotheses are concerned with the difference between two biological conditions, for example, normal verses diseased tissue, or perturbed verses unperturbed cells. Welsh’s t test, Significance Analysis of Microarrays (SAM) [5], and Linear Models for Microarray Data (limma) [6], and, in the case of highthroughput sequencing data, differential gene expression analysis based on the negative binomial distribution (DESeq2) [7]. Since there are significant statistical dependencies between the expression levels of most genes, multivariate approaches may be more appropriate for genome-wide profiling analyses that identify DEG; for example, multivariate analysis is able to find significant differential expression in cases where there is no marginal differential expression for individual genes (Figure 1)

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The characteristic direction: a geometrical approach to identify differentially expressed genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Molecular mechanisms of osteoarthritis using gene microarrays
Shuo Cui ... Zhengjiang Yuan
Acta Histochemica | VOL. 117
Shuo Cui, et. al.Shuo Cui ... Zhengjiang Yuan
29 Nov 2014
Acta Histochemica | VOL. 117

CCNB2 as a potential biomarker of bladder cancer via the high throughput technology.
Lei Zhang ... Jianzhi Su
Medicine | VOL. 102
Lei Zhang, et. al.Lei Zhang ... Jianzhi Su
10 Feb 2023
Medicine | VOL. 102

Identify differential gene expressions in fatty infiltration process in rotator cuff
Pengfei Hu ... Lidong Wu
Journal of Orthopaedic Surgery and Research | VOL. 14
Pengfei Hu, et. al.Pengfei Hu ... Lidong Wu
28 May 2019
Journal of Orthopaedic Surgery and Research | VOL. 14

Comprehensive Analysis of the Sorafenib-Associated Druggable Targets on Differential Gene Expression and ceRNA Network in Hepatocellular Carcinoma.
Zhi Fu ... Xiaoni Liu
Journal of Environmental Pathology, Toxicology and Oncology | VOL. 42
Zhi Fu, et. al.Zhi Fu ... Xiaoni Liu
01 Jan 2023
Journal of Environmental Pathology, Toxicology and Oncology | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The characteristic direction: a geometrical approach to identify differentially expressed genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics