GScluster: network-weighted gene-set clustering analysis

Sora Yoon,Seon-Young Kim,Seon-Kyu Kim,Dougu Nam,Sang-Mun Chi,Jinhwan Kim,Bukyung Baik

doi:10.1186/s12864-019-5738-6

Abstract

BackgroundGene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data. However, GSA often yields a long list of gene-sets, necessitating efficient post-processing for improved interpretation. Existing methods cluster the gene-sets based on the extent of their overlap to summarize the GSA results without considering interactions between gene-sets.ResultsHere, we presented a novel network-weighted gene-set clustering that incorporates both the gene-set overlap and protein-protein interaction (PPI) networks. Three examples were demonstrated for microarray gene expression, GWAS summary, and RNA-sequencing data to which different GSA methods were applied. These examples as well as a global analysis show that the proposed method increases PPI densities and functional relevance of the resulting clusters. Additionally, distinct properties of gene-set distance measures were compared. The methods are implemented as an R/Shiny package GScluster that provides gene-set clustering and diverse functions for visualization of gene-sets and PPI networks.ConclusionsNetwork-weighted gene-set clustering provides functionally more relevant gene-set clusters and related network analysis.

Highlights

Gene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data
A main goal of our analysis is to identify functionally relevant gene-set clusters from a long list of gene-sets; the networks between genes in our analysis can be any kind of functional interaction such as gene co-expression, co-occurrence in the literature, evolutionary distance, physical contact, or their combinations, which were all denoted as protein-protein interaction (PPI) in this article
We introduced a PPI-weighted gene-set distance that incorporates both the overlapping genes and PPIs between two gene-sets. PPI-weighted Meet/Min (pMM) was compared with existing distance measures, Meet/Min (MM) and kappa distance, in clustering a large collection of gene-sets (MSigDB C2), where pMM clusters, as expected, exhibited systematically higher PPI densities than those obtained using MM or KAPPA distances. pMM enabled to capture biologically more meaningful clusters as shown in three analysis examples

Summary

Introduction

Gene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data. Gene-set analysis (GSA) covers a broad category of methods used to identify relevant biological pathways or functions from omics data such as microarray or high throughput sequencing data [1,2,3,4]. GSA yields tens to hundreds of significant gene-sets without indicating how they interact with each other, rendering it difficult to identify core pathways or functional groups. Annotation databases such as Gene Ontology and KEGG [5, 6] partially address this issue by providing parent-offspring relations between annotation terms when used for GSA. We propose to use a network-weighted distance for clustering gene-sets and present an R/Shiny package, GScluster (https://github.com/unistbig/GScluster), for clustering

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: May 9, 2019
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

GScluster: network-weighted gene-set clustering analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Simultaneous Integration of Multi-omics Data Improves the Identification of Cancer Driver Modules.
Dana Silverbush ... Simona Cristea
Cell Systems | VOL. 8
Dana Silverbush, et. al.Dana Silverbush ... Simona Cristea
01 May 2019
Cell Systems | VOL. 8

An uncertain model-based approach for identifying dynamic protein complexes in uncertain protein-protein interaction networks
Yijia Zhang ... Yiwei Liu
BMC Genomics | VOL. 18
Yijia Zhang, et. al.Yijia Zhang ... Yiwei Liu
01 Oct 2017
BMC Genomics | VOL. 18

Proteome-wide Prediction of Signal Flow Direction in Protein Interaction Networks Based on Interacting Domains
Wei Liu ... Fuchu He
Molecular & Cellular Proteomics | VOL. 8
Wei Liu, et. al.Wei Liu ... Fuchu He
01 Sep 2009
Molecular & Cellular Proteomics | VOL. 8

Biomolecular networks and human diseases.
Fangxiang Wu ... Reda Alhajj
BioMed research international | VOL. 2014
Fangxiang Wu, et. al.Fangxiang Wu ... Reda Alhajj
01 Jan 2014
BioMed research international | VOL. 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GScluster: network-weighted gene-set clustering analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics