Abstract

Measuring the semantic similarity between Gene Ontology (GO) terms is an essential step in functional bioinformatics research. We implemented a software named GOGO for calculating the semantic similarity between GO terms. GOGO has the advantages of both information-content-based and hybrid methods, such as Resnik’s and Wang’s methods. Moreover, GOGO is relatively fast and does not need to calculate information content (IC) from a large gene annotation corpus but still has the advantage of using IC. This is achieved by considering the number of children nodes in the GO directed acyclic graphs when calculating the semantic contribution of an ancestor node giving to its descendent nodes. GOGO can calculate functional similarities between genes and then cluster genes based on their functional similarities. Evaluations performed on multiple pathways retrieved from the saccharomyces genome database (SGD) show that GOGO can accurately and robustly cluster genes based on functional similarities. We release GOGO as a web server and also as a stand-alone tool, which allows convenient execution of the tool for a small number of GO terms or integration of the tool into bioinformatics pipelines for large-scale calculations. GOGO can be freely accessed or downloaded from http://dna.cs.miami.edu/GOGO/.

Highlights

  • Inferring semantic similarities between Gene Ontology (GO)[1] terms is a fundamental component in functional bioinformatics research, such as gene clustering[2,3,4], protein function prediction[5,6] and gene-gene interactions validations[7,8,9]

  • To avoid over-reliance on most informative common ancestor (MICA), Couto et al designed GraSM that could be applied to any information content (IC)-based methods, in which the semantic similarity was calculated by the average IC of the disjunctive common ancestors (DCAs) instead of MICA

  • We developed an improved hybrid algorithm GOGO that calculates semantic similarities between GO terms based on GO directed acyclic graphs (DAGs) topology

Read more

Summary

Introduction

Inferring semantic similarities between Gene Ontology (GO)[1] terms is a fundamental component in functional bioinformatics research, such as gene clustering[2,3,4], protein function prediction[5,6] and gene-gene interactions validations[7,8,9]. Using protein function prediction as an example, it is common that the predicted protein functions of a large number of proteins (e.g., ~100,000 proteins for CAFA26) in the format of GO terms are needed to be evaluated with the GO terms obtained by experimental approaches. This process usually needs to calculate the similarities between a huge number of GO-term pairs. To avoid over-reliance on MICA, Couto et al designed GraSM that could be applied to any IC-based methods, in which the semantic similarity was calculated by the average IC of the disjunctive common ancestors (DCAs) instead of MICA. To make the calculation of semantic similarity more efficient, Zhang and Lai built GraSM using the exclusively inherited shared information (EISI) that could be applied to any IC-based methods

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call