Abstract
BackgroundGene-set analysis tools, which make use of curated sets of molecules grouped based on their shared functions, aim to identify which gene-sets are over-represented in the set of features that have been associated with a given trait of interest. Such tools are frequently used in gene-centric approaches derived from RNA-sequencing or microarrays such as Ingenuity or GSEA, but they have also been adapted for interval-based analysis derived from DNA methylation or ChIP/ATAC-sequencing. Gene-set analysis tools return, as a result, a list of significant gene-sets. However, while these results are useful for the researcher in the identification of major biological insights, they may be complex to interpret because many gene-sets have largely overlapping gene contents. Additionally, in many cases the result of gene-set analysis consists of a large number of gene-sets making it complicated to identify the major biological insights.ResultsWe present GeneSetCluster, a novel approach which allows clustering of identified gene-sets, from one or multiple experiments and/or tools, based on shared genes. GeneSetCluster calculates a distance score based on overlapping gene content, which is then used to cluster them together and as a result, GeneSetCluster identifies groups of gene-sets with similar gene-set definitions (i.e. gene content). These groups of gene-sets can aid the researcher to focus on such groups for biological interpretations.ConclusionsGeneSetCluster is a novel approach for grouping together post gene-set analysis results based on overlapping gene content. GeneSetCluster is implemented as a package in R. The package and the vignette can be downloaded at https://github.com/TranslationalBioinformaticsUnit
Highlights
Gene-set analysis tools, which make use of curated sets of molecules grouped based on their shared functions, aim to identify which gene-sets are overrepresented in the set of features that have been associated with a given trait of interest
Modern gene-set analysis (GSA) [1] are standard tools aimed to provide biological insights derived from the list of genes associated with a trait of interest
Tools such as Ingenuity Pathway Analysis (IPA) [2], GREAT [3], GSEA [4], among others, make use of curated collections of gene-sets such as Gene Ontology [5] or KEGG [6] to identify those relevant gene-sets associated with the trait of interest
Summary
We present GeneSetCluster, a novel approach which allows clustering of identified gene-sets, from one or multiple experiments and/or tools, based on shared genes. GeneSetCluster calculates a distance score based on overlapping gene content, which is used to cluster them together and as a result, GeneSetCluster identi‐ fies groups of gene-sets with similar gene-set definitions (i.e. gene content). These groups of gene-sets can aid the researcher to focus on such groups for biological interpretations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.