Abstract
BackgroundFunctional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. GO terms statistically overrepresented within a set of a large number of genes are typically used to describe the main functional attributes of the gene set. However, these lists of overrepresented GO terms are often too large and contains redundant overlapping GO terms hindering informative functional interpretations.ResultsWe developed GOMCL to reduce redundancy and summarize lists of GO terms effectively and informatively. This lightweight python toolkit efficiently identifies clusters within a list of GO terms using the Markov Clustering (MCL) algorithm, based on the overlap of gene members between GO terms. GOMCL facilitates biological interpretation of a large number of GO terms by condensing them into GO clusters representing non-overlapping functional themes. It enables visualizing GO clusters as a heatmap, networks based on either overlap of members or hierarchy among GO terms, and tables with depth and cluster information for each GO term. Each GO cluster generated by GOMCL can be evaluated and further divided into non-overlapping sub-clusters using the GOMCL-sub module. The outputs from both GOMCL and GOMCL-sub can be imported to Cytoscape for additional visualization effects.ConclusionsGOMCL is a convenient toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. GOMCL helps researchers to reduce time spent on manual curation of large lists of GO terms, minimize biases introduced by redundant GO terms in data interpretation, and batch processing of multiple GO enrichment datasets. A user guide, a test dataset, and the source code of GOMCL are available at https://github.com/Guannan-Wang/GOMCL and www.lsugenomics.org.
Highlights
Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses
As a proof of concept, we performed a GOMCL run on a list of over-represented GO terms identified from genes differentially expressed between two GFP tagged cell populations of Arabidopsis roots in a published study [22] to highlight the functional use of GOMCL
Cluster identification We used GO terms that had less than 3500 genes annotated under each GO annotation for Arabidopsis, to allow identification of specific functional traits associated with the published study
Summary
Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. This tool does not accept direct output files from other commonly used GO enrichment tools such as BiNGO, g:Profiler, or agriGO Both tools fall short at parallel processing a large number of distinct set of gene functions often encountered in large-scale -omics experiments. To address these limitations and generate a similarity-based functional GO network, we developed a new toolkit, GOMCL, that applies the Markov Clustering (MCL) algorithm [13,14,15] to identify cluster structures in GO networks in an unbiased approach. GOMCL is a user friendly python toolkit, which offers multiple visualization schemes and enables batch processing of large GO datasets to mine for functionally significant attributes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.