Abstract
BackgroundGene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data. However, GSA often yields a long list of gene-sets, necessitating efficient post-processing for improved interpretation. Existing methods cluster the gene-sets based on the extent of their overlap to summarize the GSA results without considering interactions between gene-sets.ResultsHere, we presented a novel network-weighted gene-set clustering that incorporates both the gene-set overlap and protein-protein interaction (PPI) networks. Three examples were demonstrated for microarray gene expression, GWAS summary, and RNA-sequencing data to which different GSA methods were applied. These examples as well as a global analysis show that the proposed method increases PPI densities and functional relevance of the resulting clusters. Additionally, distinct properties of gene-set distance measures were compared. The methods are implemented as an R/Shiny package GScluster that provides gene-set clustering and diverse functions for visualization of gene-sets and PPI networks.ConclusionsNetwork-weighted gene-set clustering provides functionally more relevant gene-set clusters and related network analysis.
Highlights
Gene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data
A main goal of our analysis is to identify functionally relevant gene-set clusters from a long list of gene-sets; the networks between genes in our analysis can be any kind of functional interaction such as gene co-expression, co-occurrence in the literature, evolutionary distance, physical contact, or their combinations, which were all denoted as protein-protein interaction (PPI) in this article
We introduced a PPI-weighted gene-set distance that incorporates both the overlapping genes and PPIs between two gene-sets. PPI-weighted Meet/Min (pMM) was compared with existing distance measures, Meet/Min (MM) and kappa distance, in clustering a large collection of gene-sets (MSigDB C2), where pMM clusters, as expected, exhibited systematically higher PPI densities than those obtained using MM or KAPPA distances. pMM enabled to capture biologically more meaningful clusters as shown in three analysis examples
Summary
Gene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data. Gene-set analysis (GSA) covers a broad category of methods used to identify relevant biological pathways or functions from omics data such as microarray or high throughput sequencing data [1,2,3,4]. GSA yields tens to hundreds of significant gene-sets without indicating how they interact with each other, rendering it difficult to identify core pathways or functional groups. Annotation databases such as Gene Ontology and KEGG [5, 6] partially address this issue by providing parent-offspring relations between annotation terms when used for GSA. We propose to use a network-weighted distance for clustering gene-sets and present an R/Shiny package, GScluster (https://github.com/unistbig/GScluster), for clustering
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.